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0© (54) Title: STAPHYLOCOCCUS EPIDERMD3IS NUCLEIC ACIDS AND PROTEINS 

(57) Abstract: S epidermidis polypeptides and DNA (RNA) encoding such polypeptides and a procedure for producing such 
polypeptides by recombinani techniques is disclosed. Also disclosed are methods for utilizing such polypeptides and DNA 
(RNA) for the treatment of infection, particularly infections arising from S epidermidis. Antagonists against the function of such 
polypeptides and their use as therapeutics to treat infection are also disclosed. Also disclosed are diagnostic assays for detecting 

^ diseases related to the presence of S epidermidis nucleic acid sequences and the polypeptides in a host. Also disclosed are 

^ diagnostic assays for delecting polynucleotides and polypeptides related to S epidermidis. 
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STAPHYLOCOCCUS EPIDERMIDIS NUCLEIC ACIDS AND PROTEINS 



Field of the Invention 

The present invention provides nucleic acids, and peptides, polypeptides 
and proteins encoded by the nucleic acids, isolated from Staphylococcus 
5 epidermidis. 

Background of the invention 

Staphylococcus epidermidis is a gram-positive bacteria present in the 
normal flora of humans, and is typically present on the skin. It is catalase positive 

10 and grows aerobically. It is inplicated in various human conditions and diseases, 
including subacute bacterial endocarditis (Baddour LM et al., Production of 
experimental endocarditis by coagulase-negative staphylococci: variability in 
species virulence, J. Infect. Dis. 150: 721-727, 1 984>Karchmer AW, Archer GL, 
Dismukes WE, Staphylococcus epidermidis causing prosthetic valve endocarditis: 

15 microbiologic and clinical observations as guides to therapy, Ann Intern Med. 

1983;98:447-455.) and septicemia (Christensen GD et al M Nosocomial septicemia 
due to multiply antibiotic-resistant Staphylococcus epidermidis, Ann. Intern. Med. 
96: 1-10, 1982). S. epidermidis is estimated to be responsible for about 12% of all 
hospital patient infections. Because of the organism's peculiar ability to colonize 

20 polymer and metallic surfaces, there is a correlation of infection with the insertion 
of intravenous lines or catheters or implantation of prosthetic devices. Treatment 
can be difficult since different isolates of S. epidermidis show a broad spectrum of 
antibiotic resistance. The organism also produces a polysaccharide biofilm which 
helps to protect the bacteria from the human immune system (Tojo M et al., 

25 Isolation and characterization of a capsular polysaccharide adhesin from 
Staphylococcus epidermidis, J. Infect. Dis. 157: 713-722, 1988). 

The present invention advantageously provides isolated nucleic acids and 
their encoded peptides, polypeptides and proteins from the genome of S. 
epidermidis, as well as the genomic map of S. epidermidis. Thus, the present 

30 invention fulfils a a widely-felt need for S.epidermidis diagnostics, antigens, and 
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products useful in procedures for preparing antibodies and for identifying 
compounds effective against S. epidermidis infection. Selected nucleic acids 
and/or polypeptides of the present invention can be advantageously utilized as 
targets in screenings assays for antibiotics, as diagnostics of infections, and as 
5 means to identify S epidermidis in any given sample and distinguish it from other 
bacteria. 

SUMMARY OF THE INVENTION 

The present invention provides an isolated polynucleotide comprising a 
10 member selected from the group consisting of: 

(a) a polynucleotide encoding a polypeptide having at least a 70% 
identity to a polypeptide set forth in the Sequence Listing; 

(b) a polynucleotide which is complementary to the polynucleotide of (a); 

and 

15 (c) a polynucleotide comprising at least 15 sequential bases of the 

polynucleotide of (a) or (b). The present invention further provides polypeptides 
encoded by these polynucleotides and methods of using the polynucleotides and 
polypeptides.. 

20 DETAILED DESCRIPTION OF THE INVENTION 

GLOSSARY 

The following illustrative explanations are provided to facilitate understanding 
of certain terms used frequently herein, particularly in the Examples. The 
explanations are provided as a convenience and are not limitative of the invention. 

25 BINDING MOLECULE refers to a molecule or ion which binds or interacts 

specifically with polypeptides or polynucleotides of the present invention, including, 
for example enzyme substrates, cell membrane components and classical receptors. 
Binding between polypeptides (or polynucleotides) of the invention and such 
molecules may be exclusive to polypeptides of the invention, which is preferred, or it 

30 may be highly specific for polypeptides of the invention, which is also preferred, or it 
may be highly specific to a group of proteins that includes polypeptides of the 
invention, which is preferred, or it may be specific to several groups of proteins at 
least one of which includes a polypeptide of the invention. Binding molecules also 
include antibodies and antibody-derived reagents that bind specifically to 

35 polypeptides of the invention. 
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GENETIC ELEMENT generally means a polynucleotide comprising a region 
that encodes a polypeptide or a polynucleotide region that regulates replication, 
transcription or translation or other processes important to expression of the 
polypeptide in a host cell, or a polynucleotide comprising both a region that encodes 
5 a polypeptide and a region operably linked thereto that regulates expression. 

Genetic elements may be comprised within a vector that replicates as anepisomal 
element; that is, as a molecule physically independent of the host cell genome. They 
may be comprised within plasmids. Genetic elements also may be comprised within 
a host cell genome; not in their natural state but, rather, following manipulation such 
10 as isolation, cloning and introduction into a host cell in the form of purified DNA or in 
a vector, among others. 

HOST CELL is a cell which has been transformed or transfected, or is 
capable of transformation or transfection by an exogenous polynucleotide 
sequence. 

15 IDENTITY, as known in the art, is the relationship between two or more 

polypeptide sequences or two or more polynucleotide sequences, as determined by 
comparing the sequences. In the art, identity alsomeans the degree of sequence 
relatedness between polypeptide or polynucleotide sequences, as the case may 
be, as determined by the match between strings of such sequences. Identity can 

20 be readily calculated (Computational Molecular Biology, Lesk, A.M., ed., Oxford 
University Press, New York, 1988; Biocomputing: Informatics and Genome 
Projects, Smith, D.W., ed., Academic Press, New York, 1993; Computer Analysis 
of Sequence Data, Part I, Griffin, A.M., and Griffin, H.G., eds., Humana Press, 
New Jersey, 1994; Sequence Analysis in Molecular Biology, von Heinje, G., 

25 Academic Press, 1987; and Sequence Analysis Primer, Gribskov, M. and 
Devereux, J., eds., M Stockton Press, New York, 1991). While there exist a 
number of methods to measure identity between two polynucleotide or two 
polypeptide sequences, the term is well known to skilled artisans (Sequence 
Analysis in Molecular Biology, von Heinje, G., Academic Press, 1987; Sequence 

30 Analysis Primer, Gribskov, M. and Devereux, J., eds., M Stockton Press, New 
York, 1991; and Carillo, H., and Lipman, D., SIAM J. Applied Math., 48: 1073 
(1988)). Methods commonly employed to determine identity between sequences 
include, but are not limited to those disclosed in Carillo, H., and Lipman, D., SIAM 
J. Applied Math., 48:1073 (1988). Preferred methods to determine identity are 

35 designed to give the largest match between the sequences tested. Methods to 
determine identity are codified in computer programs. Preferred computer 
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program methods to determine identity between two sequences include, but are 
not limited to, GCG program package (Devereux, J., et al., Nucleic Acids 
Research 12(1): 387 (1984)), BLASTP, BLASTN, and FASTA (Atschul, S.F. et at., 
J. Molec. Biol. 2)5:403 (1990)). 
5 ISOLATED means separated "by the hand of man" from its natural state;/.e., 

that, if it occurs in nature, it has been changed or removed from its original 
environment, or both. For example, a naturally occurring polynucleotide or a 
polypeptide naturally present in a living organism in its natural state is not "isolated," 
but the same polynucleotide or polypeptide separated from the coexisting materials 

10 of its natural state is "isolated", as the term is employed herein. As part of or 

following isolation, such polynucleotides can be joined to other polynucleotides, such 
as DNAs, for mutagenesis, to form fusion proteins, and for propagation or expression 
in a host, for instance. The isolated polynucleotides, alone or joined to other 
polynucleotides such as vectors, can be introduced into host cells, in culture or in 

15 whole organisms. Introduced into host cells in culture or in whole organisms, such 
DNAs still would be isolated, as the term is used herein, because they would not be 
in their naturally occurring form or environment. Similarly, the polynucleotides and 
polypeptides may occur in a composition, such as a media formulations, solutions for 
introduction of polynucleotides or polypeptides, for example, into cells, compositions 

20 or solutions for chemical or enzymatic reactions, for instance, which are not naturally 
occurring compositions, and, therein remain isolated polynucleotides or polypeptides 
within the meaning of that term as it is employed herein. 

POLYNUCLEOTIDE(S) generally refers to any polyribonucleotide or 
polydeoxribonucleotide, which may be unmodified RNA or DNA or modified RNA or 

25 DNA. Thus, for instance, polynucleotides as used herein refers to, among others, 
single-and double-stranded DNA, DNA that is a mixture of single- and double- 
stranded regions or single-, double- and triple-stranded regions, single- and double- 
stranded RNA, and RNA that is mixture of single- and double-stranded regions, 
hybrid molecules comprising DNA and RNA that may be single-stranded or, more 

30 typically, double-stranded, or triple-stranded, or a mixture of single- and double- 
stranded regions. In addition, polynucleotide as used herein refers to triple-stranded 
regions comprising RNA or DNA or both RNA and DNA. The strands in such 
regions may be from the same molecule or from different molecules. The regions 
may include all of one or more of the molecules, but more typically involve only a 

35 region of some of the molecules. One of the molecules of a triple-helical region often 
is an oligonucleotide. As used herein, the term polynucleotide includes DNAs or 
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RNAs as described above that contain one or more modified bases. Thus.DNAs or 
RNAs with backbones modified for stability or for other reasons are "polynucleotides" 
as that term is intended herein. Moreover, DNAs or RNAs comprising unusual 
bases, such as inosine, or modified bases, such astritylated bases, to name just two 
5 examples, are polynucleotides as the term is used herein. It will be appreciated that 
a great variety of modifications have been made to DNA and RNA that serve many 
useful purposes known to those of skill in the art. The term polynucleotide as it is 
employed herein embraces such chemically, enzymatically or metabolicaily modified 
forms of polynucleotides, as well as the chemical forms of DNA and RNA 

10 characteristic of viruses and cells, including simple and complex cells, inter alia. The 
term polynucleotide also embraces short polynucleotides often referred to as 
oligonucleotide(s). "Polynucleotide" and "nucleic acid" are often used 
interchangeably herein. 

POLYPEPTIDES, as used herein, includes all polypeptides as described 

15 below. The basic structure of polypeptides is well known and has been described in 
innumerable textbooks and other publications in the art. In this context, the term is 
used herein to refer to any peptide or protein comprising two or more amino acids 
joined to each other in a linear chain by peptide bonds. As used herein, unless 
otherwise indicated, the term refers to both short chains, which also commonly are 

20 referred to in the art as peptides, oligopeptides and oligomers, for example, and to 
longer chains, which generally are referred to in the art as proteins, of which there 
are many types. It will be appreciated that polypeptides often contain amino acids 
other than the 20 amino acids commonly referred to as the 20 naturally occurring 
amino acids, and that many amino acids, including the terminal amino acids, may be 

25 modified in a given polypeptide, either by natural processes, such as processing and 
other post-translational modifications, but also by chemical modification techniques 
which are well known to the art. Even the common modifications that occur naturally 
in polypeptides are too numerous to list exhaustively here, but they are well 
described in basic texts and in more detailed monographs, as well as in a 

30 voluminous research literature, and they are well known to those of skill in the art. 
Among the known modifications which may be present in polypeptides of the present 
are, to name an illustrative few, acetylation, acylation, ADP-ribosylation, amidation, 
covalent attachment of flavin, covalent attachment of a heme moiety, covalent 
attachment of a nucleotide or nucleotide derivative, covalent attachment of a lipid or 

35 lipid derivative, covalent attachment of phosphotidylinositol, cross-linking, cyclization, 
disulfide bond formation, demethyiation, formation of covalent cross-links, formation 
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of cystine, formation of pyroglutamate, formylation, gamma-carboxylation, 
glycosylation, GPI anchor formation, hydroxylation, iodination.methylation, 
myristoylation, oxidation, proteolytic processing, phosphorylation, prenylation, 
racemization, selenoylation, sulfation, transfer-RNA mediated addition of amino 
5 acids to proteins such as arginylation, and ubiquitination. Such modifications are 
well known to those of skill and have been described in great detail in the scientific 
literature. Several particularly common modifications, glycosylation, lipid attachment, 
sulfation, gamma-carboxylation of glutamic acid residues, hydroxylation and ADP- 
ribosylation, for instance, are described in most basic texts, such as, for instance 

10 PROTEINS - STRUCTURE AND MOLECULAR PROPERTIES, 2nd Ed. , T. E. 

Creighton, W. H. Freeman and Company, New York (1993). Many detailed reviews 
are available on this subject, such as, for example, those provided by Wold, F., 
Posttranslational Protein Modifications: Perspectives and Prospects, pgs. 1-12 in 
POSTTRANSLA TIONAL COVALENT MODIFICATION OF PROTEINSi B. C. 

15 Johnson, Ed., Academic Press, New York (1983); Seifter et al., Meth. EnzymoL 
1 82:626-646 (1990) and Rattan et al., Protein Synthesis: Posttranslational 
Modifications and Aging, Ann. N Y. Acad. Sci. 663: 48-62 (1992). It will be 
appreciated, as is well known and as noted above, that polypeptides are not always 
entirely linear. For instance, polypeptides may be generally as a result of 

20 posttranslational events, including natural processing event and events brought 
about by human manipulation which do not occur naturally. Circular, branched and 
branched circular polypeptides may be synthesized by non-translation natural 
process and by entirely synthetic methods, as well. Modifications can occur 
anywhere in a polypeptide, including the peptide backbone, the amino acid side- 

25 chains and the amino or carboxyl termini. In fact, blockage of the amino or carboxyl 
group in a polypeptide, or both, by a covalent modification, is common in naturally 
occurring and synthetic polypeptides and such modifications may be present in 
polypeptides of the present invention, as well. For instance, the amino terminal 
residue of polypeptides made in E. coli or other cells, prior to proteolytic processing, 

30 almost invariably will be N-formylmethionine. During post-translational modification 
of the peptide, a methionine residue at the NH 2 -terminus may be deleted. 
Accordingly, this invention contemplates the use of both the methionine-containing 
and the methionineless amino terminal variants of the protein of the invention. 
The modifications that occur in a polypeptide often will be a function of how it is 

35 made. For polypeptides made by expressing a cloned gene in a host, for instance, 
the nature and extent of the modifications in large part will be determined by the host 
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cell posttranslational modification capacity and the modification signals present in the 
polypeptide amino acid sequence. For instance, as is well known, gly cosy lation 
often does not occur in bacterial hosts such as, for example, E. colL Accordingly, 
when glycosylation is desired, a polypeptide should be expressed in a glycosylating 
5 host, generally a eukaryotic cell. Insect cell often carry out the same 

posttranslational glycosylates as mammalian cells and, for this reason, insect cell 
expression systems have been developed to express efficiently mammalian proteins 
having native patterns of glycosylation, inter alia. Similar considerations apply to 
other modifications. It will be appreciated that the same type of modification may be 

10 present in the same or varying degree at several sites in a given polypeptide. Also, a 
given polypeptide may contain many types of modifications. In general, as used 
herein, the term polypeptide encompasses all such modifications, particularly those 
that are present in polypeptides synthesized recombinantly by expressing a 
polynucleotide in a host cell. 

15 VARIANT(S) of polynucleotides or polypeptides, as the term is used herein, 

are polynucleotides or polypeptides that differ from a reference polynucleotide or 
polypeptide, respectively. Variants in this sense are described below and elsewhere 
in the present disclosure in greater detail. (1) A polynucleotide that differs in 
nucleotide sequence from another, reference polynucleotide. Generally, differences 

20 are limited so that the nucleotide sequences of the reference and the variant are 
closely similar overall and, in many regions, identical. As noted below, changes in 
the nucleotide sequence of the variant may be silent. That is, they may not alter the 
amino acids encoded by the polynucleotide. Where alterations are limited to silent 
changes of this type a variant will encode a polypeptide with the same amino acid 

25 sequence as the reference. Also as noted below, changes in the nucleotide 
sequence of the variant may alter the amino acid sequence of a polypeptide 
encoded by the reference polynucleotide. Such nucleotide changes may result in 
amino acid substitutions, additions, deletions, fusions and truncations in the 
polypeptide encoded by the reference sequence, as discussed below. (2) A 

30 polypeptide that differs in amino acid sequence from another, reference polypeptide. 
Generally, differences are limited so that the sequences of the reference and the 
variant are closely similar overall and, in many regions, identical. A variant and 
reference polypeptide may differ in amino acid sequence by one or more 
substitutions, additions, deletions, fusions and truncations, which may be present in 

35 any combination. 
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Techniques are available to evaluate temporal gene expression in 
bacteria, particularly as it applies to viability under laboratory and host infection 
conditions. A number of methods can be used to identify genes which are 
essential to survival per se, or essential to the establishment/maintenance of an 
5 infection. Identification of expression of a sequence by one of these methods 
yields additional information about its function and permits the selection of such 
sequence for further development as a screening target. Briefly, these 
approaches include: 

1) Signature Tagged Mutagenesis (STM) 

10 This technique is described by Hensel et a/., Science 269: 400-403(1995), 

the contents of which is incorporated by reference for background purposes. 
Signature tagged mutagenesis identifies genes necessary for the 
establishment/maintenance of infection in a given infection model. 

The basis of the technique is the random mutagenesis of target organism 

15 by various means (e.g., transposons) such that unique DNA sequence tags are 
inserted in close proximity to the site of mutation. The tags from a mixed 
population of bacterial mutants and bacteria recovered from an infected host are 
detected by amplification, radiolabeling and hybridization analysis. Mutants 
attenuated in virulence are revealed by absence of the tag from the pool of 

20 bacteria recovered from infected hosts. 

In Streptococcus pneumoniae, because the transposon system is less well 
developed, a more efficient way of creating the tagged mutants is to use the 
insertion-duplication mutagenesis technique as described by Morrison et a/., J. 
Bacterid. 159:870 (1984) the contents of which is incorporated by reference for 

25 background purposes. ' 

2) In Vivo Expression Technology (IVET) 

This technique is described by Camilli et a/. f Proc. Natl Acad. Sci. USA. 

91:2634-2638 (1994), the contents of which is incorporated by reference for 

background purposes. IVET identifies genes up-regulated during infection when 
30 compared to laboratory cultivation, implying an important role in infection. 

Sequences identified by this technique are implied to have a significant role in 

infection establishment/maintenance. 

In this technique random chromosomal fragments of target organism are 

cloned upstream of a promoter-less reporter gene in a plasm id vector. The pool is 
35 introduced into a host and at various times after infection bacteria may be 

recovered and assessed for the presence of reporter gene expression. The 
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chromosomal fragment carried upstream of an expressed reporter gene should 
carry a promoter or portion of a gene normally upregulated during infection. 
Sequencing upstream of the reporter gene allows identification of the up regulated 
gene. 

5 3) Differential display 

This technique is described by Chuang ef a/., J. Bacteriol. 175:2026-2036 
(1993), the contents of which is incorporated by reference for background 
purposes. This method identifies those genes which are expressed in an 
organism by identifying mRNA present using randomly-primed RT-PCR. By 
10 comparing pre-infection and post infection profiles, genes up and down regulated 
during infection can be identified and the RT-PCR product sequenced and 
matched to library sequences. 

4) Generation of conditional lethal mutants by transposon mutagenesis. 

This technique, described by de Lorenzo, V. et al. t Gene 123:17-24 

15 (1993); Neuwald, A. F. et ai, Gene 125: 69-73(1993); and Takiff, H. E. et a/., 
J. BacterioL 174:1544-1553(1992), the contents of which is incorporated by 
reference for background purposes, identifies genes whose expression are 
essential for cell viability. 

In this technique transposons carrying controllable promoters, which 

20 provide transcription outward from the transposon in one or both directions, are 
generated. Random insertion of these transposons into target organisms and 
subsequent isolation of insertion mutants in the presence of inducer of promoter 
activity ensures that insertions which separate promoter from coding region of a 
gene whose expression is essential for cell viability will be recovered. 

25 Subsequent replica plating in the absence of inducer identifies such insertions, 
since they fail to survive. Sequencing of the flanking regions of the transposon 
allows identification of site of insertion and identification of the gene disrupted. 
Close monitoring of the changes in cellular processes/morphology during growth 
in the absence of inducer yields information on likely function of the gene. Such 

30 monitoring could include flow cytometry (cell division, lysis, redox potential, DNA 
replication), incorporation of radiochemically labeled precursors into DNA, RNA, 
protein, lipid, peptidoglycan, monitoring reporter enzyme gene fusions which 
respond to known cellular stresses. 

5) Generation of conditional lethal mutants by chemical mutagenesis. 

35 This technique is described by Beckwith, J., Methods in Enzymology 204: 
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3-18(1991), the contents of which are incorporated herein by reference for 
background purposes. In this technique random chemical mutagenesis of target 
organism, growth at temperature other than physiological temperature (permissive 
temperature) and subsequent replica plating and growth at different temperature 
5 (e.g. 42°C to identify ts ( 25°C to identify cs) are used to identify those isolates 
which now fail to grow (conditional mutants). As above close monitoring of the 
changes upon growth at the non-permissive temperature yields information on the 
function of the mutated gene. Complementation of conditional lethal mutation by 
library from target organism and sequencing of complementing gene allows 

10 matching with library sequences. 

Each of these techniques may have advantages or disadvantages 
depending on the particular application. The skilled artisan would choose the 
approach that is the most relevant with the particular end use in mind. For 
example, some genes might be recognised as essential for infection but in reality 

15 are only necessary for the initiation of infection and so their products would 
represent relatively unattractive targets for antibacterials developed to cure 
established and chronic infections. 
6) RT-PCR 

Bacterial messenger RNA, preferably that of S. epidermidis, is isolated 

20 from bacterial infected tissue e.g. 48 hour murine lung infections, and the amount 
of each mRNA species assessed by reverse transcription of the RNA sample 
primed with random hexanucleotides followed by PCR with gene specific primer 
pairs. The determination of the presence and amount of a particular mRNA 
species by quantification of the resultant PCR product provides information on the 

25 bacterial genes which are transcribed in the infected tissue. Analysis of gene 
transcription can be carried out at different times of infection to gain a detailed 
knowledge of gene regulation in bacterial pathogenesis allowing for a clearer 
understanding of which gene products represent targets for screens for novel 
antibacterials. Because of the gene specific nature of the PCR primers employed 

30 it should be understood that the bacterial mRNA preparation need not be free of 
mammalian RNA. This allows the investigator to carry out a simple and quick 
RNA preparation from infected tissue to obtain bacterial mRNA species which are 
very short lived in the bacterium (in the order of 2 minute halflives). Optimally the 
bacterial mRNA is prepared from infected murine lung tissue by mechanical 

35 disruption in the presence of TRIzole (GIBCO-BRL) for very short periods of time, 
subsequent processing according to the manufacturers of TRIzole reagent and 
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DNAase treatment to remove contaminating DNA. Preferably the process is 
optimized by finding those conditions which give a maximum amount of bacterial 
16S ribosomal RNA, preferably that of S. epidermidis, as detected by probing 
Northerns with a suitably labeled sequence specific oligonucleotide probe. 
5 Typically a 5' dye labelled primer is used in each PCR primer pair in a PCR 
reaction which is terminated optimally between 8 and 25 cycles. The PCR 
products are separated on 6% polyacrylamide gels with detection and 
quantification using GeneScanner (manufactured by ABI). 

Use of the of these technologies when applied to the sequences of the 
10 present invention enables identification of bacterial proteins expressed during 
infection, inhibitors of which would have utility in anti-bacterial therapy. 

Polynucleotides 

The present invention relates to novel polynucleotides and novel 
15 polypeptides of S. epidermidis, among other things, as described below. The 
invention particularly relates to the nucleotide sequences set forth in the 
Sequence Listing SEQ ID NOs: 1-3334, typically as odd numbered ID numbers, 
and the corresponding deduced amino acid sequences also set forth in the 
Sequence Listing SEQ ID NOs:1-3334, typically as even numbered ID numbers. 
20 SEQ ID NOs 1-3334 refer to open reading frames (ORFs). The invention also 
relates to consensus polynucleotide sequences from which the ORFs were 
extracted. These genomic sequences include the ORFs, intergenic regions and 
ribosomal RNA genes. Such genomic polynucleotides are set forth as SEQ ID 
Nos 3335-4464. It will be noted that minor errors in sequencing can occur which 
25 do not depart from the spirit of the invention; S. epidermidis polynucleotides and 
polypeptides having any corrected sequences are thus encompassed by this 
invention. 

Using the information provided herein and known, standard methods, such 
as those for cloning and sequencing and those for synthesizing polynucleotides 

30 and polypeptides (see, e.g., Sambrook et a/., Molecular Cloning: A Laboratory 
Manual, 2d Ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY 
(1989)), one can generate numerous unique fragments, both longer and shorter 
than the polynucleotides and polypeptides set forth in the Sequence Listing, of the 
S. epidermidis genome and the S. epidermidis coding regions, which are 

35 encompassed by the present invention. To be unique, a fragment must be of 
sufficient size to distinguish it from other known nucleic acid sequences, most 
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readily determined by comparing any selected S. epidermidis fragment to the 
nucelotide sequences in computer databases such as GenBank. Such 
comparative searches are standard in the art. Many unique fragments will be S. 
epidermidis - specific. Typically, a unique fragment useful as a primer or probe 
5 will be at least about 20 to about 25 nucleotides in length, depending upon the 
specific nucleotide content of the sequence. Additionally, fragments can be, for 
example, at least about 30, 40, 50, 60, 75, 80, 90, 100, 150, 200, 250, 300, 400, 
500 or more nucleotides in length. The nucleic acid fragment can be single, 
double or triple stranded, depending upon the purpose for which it is intended. 

io Additionally, as discussed above and below, modifications can be made to 

the S. epidermidis polynucleotides and polypeptides that are encompassed by the 
present invention. For example, nucleotide substitutions can be made which do 
not affect the polypeptide encoded by the nucleic acid, and thus any 
polynucleotide which encodes the polypeptides of this invention is within the 

15 present invention. Additionally, certain amino acid substitutions (and 

corresponding nucleotide substitutions to encode them) can be made which are 
known in the art to be neutral (Robinson W.E. Jr. and Mitchell, W.m., AIDS A: 
S141-S162 (1990). Such variations may arise naturally as allelic variations (e.g., 
due to genetic polymorphism) or may be produced by human intervention (e.g., by 

20 mutagenesis of cloned DNA sequences), such as induced point, deletion, 

insertion and substitution mutations. Minor changes in amino acid sequence are 
generally preferred, such as conservative amino acid replacements, small internal 
deletions or insertions, and additions or deletions at the ends of the molecules. 
Substitutions may be designed based on, for example, the model of Dayhoff, et a/. 

25 (in Atlas of Protein Sequence and Structire 1978, Nat'l Biomed. Res. Found., 
Washington D.C.). These modifications can result in changes in the amino acid 
sequence, provide silent mutations, modify a restriction site, or provide other 
specific mutations. Likewise, such amino acid changes result in a different nucleic 
acid encoding the polypeptides and proteins. Thus, alternative polynucleotides, 

30 which are within the parameters of the present invention, are contemplated by 
such modifications. 

Furthermore, the polynucleotide sequences set forth as SEQ ID Nos: 1- 
3334 in the Sequence Listing are open reading frames (ORFs), i.e., coding 
regions of S. epidermidis. The polypeptide encoded by each open reading frame 

35 can be deduced, and the molecular weight of the polypeptide thus calculated 
using amino acid residue molecular weight values well known in the art. Any 



WO 01/34809 



PCT/USOO/30782 



13 

selected coding region can be functionally linked, using standard techniques such 
as standard subcloning techniques, to any desired regulatory sequence, whether 
a S. epidermidis regulatory sequence or a heterologous regulatory sequence, or 
to a heterologous coding sequence to create a fusion protein, as further described 
5 herein. 

Polynucleotides of the present invention may be in the form of RNA, such as 
mRNA or cRNA, or in the form of DNA, including, for instance, cDNA and genomic 
DNA obtained by cloning or produced by chemical synthetic techniques or by a 
combination thereof. The DNA may be triple-stranded, double-stranded or single- 

10 stranded. Single-stranded DNA may be the coding strand, also known as the sense 
strand, or it may be the non-coding strand, also referred to as the anti-sense strand. 

The coding sequence which encodes a S. epidermidis polypeptide of this 
invention may be identical to the coding sequence of a polynucleotide set forth in the 
sequence listing. It also may be a polynucleotide with a different sequence which, as 

15 a result of the redundancy (degeneracy) of the genetic code, encodes a S. 
epidermidis polypeptide set forth in theseqence listing. 

Polynucleotides of the present invention which encode a S. epidermidis 
polypeptide set forth in theseqence listing may include, but are not limited to, the 
coding sequence for a mature polypeptide, by itself; the coding sequence for a 

20 mature polypeptide and additional coding sequences, such as those encoding a 
leader or secretory sequence, such as a pre-, or pro- or prepro- protein sequence; 
the coding sequence of a mature polypeptide, with or without the aforementioned 
additional coding sequences, together with additional, non-coding sequences, 
including for example, but not limited to non-coding 5' and 3' sequences, such as the 

25 transcribed, non-translated sequences that play a role in transcription (including 
termination signals, for example), ribosome binding, mRNA stability elements, and 
additional coding sequence which encode additional amino acids, such as those 
which provide additional functionalities. Thus, for instance, a polypeptide may be 
fused to a marker sequence, such as a peptide, which facilitates purification of the 

30 fused polypeptide. In certain embodiments of this aspect of the invention, the 
marker sequence is a hexa-histidine peptide, such as the tag provided in thepQE 
vector (Qiagen, Inc.), among others, many of which are commercially available. As 
described in Gentz et a/., Proc. Natl Acad. Sci, USA 86: 821-824 (1989), for 
instance, hexa-histidine provides for convenient purification of the fusion protein. 

35 The HA tag may also be used to create fusion proteins and corresponds to an 
epitope derived of influenza hemagglutinin protein, which has been described by 



WO 01/34809 



PCTYUS00/30782 



14 

Wilson ef a/., Cell 37: 767 (1984), for instance. Polynucleotides of the invention also 
include, but are not limited to, polynucleotides comprising a structural gene and its 
naturally associated genetic elements. 

In accordance with the foregoing, the term "polynucleotide encoding a 
5 polypeptide' 1 as used herein encompasses polynucleotides which include a 

sequence encoding a polypeptide of the present invention, particularly a polypeptide 
having a S.epidermidis amino acid sequence set forth in the Sequence Listing. The 
term encompasses polynucleotides that include a single continuous region or 
discontinuous regions encoding the polypeptide (for example, interrupted by 

10 integrated phage or insertion sequence or editing) together with additional regions, 
that also may contain coding and/or non-coding sequences. 

The present invention further relates to variants of the herein above 
described polynucleotides which encode for fragments, analogs and derivatives of 
the polypeptide having a deduced S. epidermidis amino acid sequence set forth in 

15 the Sequence Listing. A variant of the polynucleotide may be a naturally occurring 
variant such as a naturally occurring allelic variant, or it may be a variant that is not 
known to occur naturally. Such non-naturally occurring variants of the polynucleotide 
may be made by mutagenesis techniques, including those applied to 
polynucleotides, cells or organisms. 

20 Among variants in this regard are variants that differ from tie aforementioned 

polynucleotides by nucleotide substitutions, deletions or additions. The substitutions, 
deletions or additions may involve one or more nucleotides. The variants may be 
altered in coding or non-coding regions or both. Alterations in the coding regions 
may produce conservative or non-conservative amino acid substitutions, deletions or 

25 additions. Preferred are polynucleotides encoding a variant, analog, derivative or 
fragment, or a variant, analogue or derivative of a fragment, which have a S. 
epidermidis sequence as set forth in the Sequence Listing, in which several, a few, 5 
to 10, 1 to 5, 1 to 3, 2, 1 or no amino acid(s) is substituted, deleted or added, in any 
combination. Especially preferred among these are silent substitutions, additions 

30 and deletions, which do not alter the properties and activities of theS.epidermidis 
polypeptides set forth in the Sequence Listing. Also especially preferred in this 
regard are conservative substitutions. 

Further preferred embodiments of the invention are polynucleotides that 
are at least 70% identical over their entire length to a polynucleotide encoding a 

35 polypeptide having an amino acid sequence set forth in the Sequence Listing, and 
polynucleotides which are complementary to such polynucleotides. Alternatively, 
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most highly preferred are polynucleotides that comprise a region that is at least 
80% or at least 85% identical over their entire length to a polynucleotide encoding 
a S. epidermidis polypeptide set forth in the Sequence Listing, including 
complementary polynucleotides. In this regard, polynucleotides at least 90%, 

5 91%, 92%, 93%, 94%, 95%, or 96% identical over their entire length to the same 
are particularly preferred, and among these particularly preferred polypeptides, 
those with at least 95% are especially preferred. . Furthermore, those with at least 
97% are highly preferred among those with at least 95%, and among these, those 
with at least 98% and at least 99% are particularly highly preferred, with at least 99% 

10 or 99.5% being the more preferred. 

Preferred embodiments in this respect, moreover, are polynucleotides which 
encode polypeptides which retain substantially the same biological function or 
activity as the mature polypeptide encoded by the DNA set forth in the Sequence 
Listing. 

1 5 The present invention further relates to polynucleotides that hybridize to the 

herein above-described sequences. In this regard, the present invention especially 
relates to polynucleotides which hybridize under stringent conditions to the herein 
above-described polynucleotides. Stringent conditions are typically selective 
conditions. As herein used, the term "stringent conditions" means hybridization will 

20 occur only if there is at least 95% and preferably at least 97% identity between the 
sequences. For a specific sequence, stringent conditions can be determined 
empirically according to the nucleotide content, as is known in the art. For example, 
a typical example of stringent conditions is hybridization of a 48mer having 55% GC 
content at 42°C in 50% formamide and 750 mM NaCI followed by washing at 55°C in 

25 15 mM NaCI and 0.1% SDS. 

As discussed additionally herein regarding polynucleotide assays of the 
invention, for instance, polynucleotides of the invention as discussed above, may be 
used as a hybridization probe for RNA, cDNA and genomic DNA to isolate full-length 
cDNAs and genomic clones encoding polypeptides of the present invention and to 

30 isolate cDNA and genomic clones of other genes that have a high sequence 

similarity to the polynucleotides of the present invention. Such probes generally will 
comprise at least 15 bases. Preferably, such probes will have at least 20, at least 25 
or at least 30 bases, and may have at least 50 bases. Particularly preferred probes 
will have at least 30 bases, and will have 50 bases or less, such as 30, 35, 40, 45, or 

35 50 bases. 
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For example, the coding region of the polynucleotide of the present 
invention may be isolated by screening using the known DNA sequence to 
synthesize an oligonucleotide probe. A labeled oligonucleotide having a sequence 
complementary to that of a gene of the present invention is then used to screen a 
5 library of cDNA, genomic DNA or mRNA to determine to which members of the 
library the probe hybridizes. 

The polynucleotides and polypeptides of the present invention may be 
employed as reagents and materials for development of treatments of and 
diagnostics for disease, particularly human disease, as further discussed herein 

10 relating to polynucleotide assays, inter alia. 

The polynucleotides of the present invention that are oligonucleotides can be 
used in the processes herein as described, but preferably for PCR, to determine 
whether or not the S. epidermidis genes identified herein in whole or in part are 
present and/or transcribed in infected tissue such as blood. It is recognized that 

15 such sequences will also have utility in diagnosis of the stage of infection and type 
of infection the pathogen has attained. 

The polynucleotides may encode a polypeptide which is the mature protein 
plus additional amino or carboxyl-terminal amino acids, or amino acids interior to the 
mature polypeptide (when the mature form has more than one polypeptide chain, for 

20 instance). Such sequences may play a role in processing of a protein from precursor 
to a mature form, may allow protein transport, may lengthen or shorten protein half- 
life or may facilitate manipulation of a protein for assay or production, among other 
things. As generally is the case/n wvo, the additional amino acids may be 
processed away from the mature protein by cellular enzymes. 

25 A precursor protein, having the mature form of the polypeptide fused to one 

or more prosequences may be an inactive form of the polypeptide. When 
prosequences are removed such inactive precursors generally are activated. Some 
or all of the prosequences may be removed before activation. Generally, such 
precursors are called proproteins. 

30 The present invention additionally contemplates polynucleotides functionally 

encoding fusion polypeptides wherein the fusion polypeptide comprises a fragment 
of a S. epidermidis polypeptide and one or more polypeptide(s) derived from another 
S. epidermidis polypeptide or from another organism or a syntheticpolyamino acid 
sequence. Such polynucleotides may or may not encode amino acid sequences to 

35 facilitate cleavage of the S.epidermidis polypeptide from the other polypeptide(s) 
under appropriate conditions. 
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In sum, a polynucleotide of the present invention may encode a mature 
protein, a mature protein plus a leader sequence (which may be referred to as a 
preprotein), a precursor of a mature protein having one ormore prosequences 
which are not the leader sequences of a preprotein, or a preproprotein, which is a 
5 precursor to a proprotein, having a leader sequence and one or more prosequences, 
which generally are removed during processing steps that produce active and 
mature forms of the polypeptide. 

Polypeptides 

10 The present invention further relates to peptides, polypeptides and 

proteins (collectively referred to as "polypeptides'^ S. epidermidis. The amino 
acid sequence of these polypeptdes is set forth in the Sequence Listing. 

The invention also relates to fragments, analogs and derivatives of these 
polypeptides. The terms 'fragment," "derivative" and "analog" when referring to a 

15 polypeptide whose amino acid sequence is set forth in the Sequence Listing, means 
a polypeptide which retains essentially the same biological function or activity as 
such polypeptide. Thus, an analog includes a proprotein which can be activated by 
cleavage of the proprotein portion to produce an active mature polypeptide. 
The fragment, derivative or analog of the polypeptide of the present 

20 invention may be 0) one in which one or more of the amino acid residues are 
substituted with a conserved or non-conserved amino acid residue (preferably a 
conserved amino acid residue) and such substituted amino acid residue may or may 
not be one encoded by the genetic code, or (ii) one in which one or more of the 
amino acid residues includes a substituent group, or (iii) one in which the mature 

25 polypeptide is fused with another compound, such as a compound to increase the 
half-life of the polypeptide (for example, polyethylene glycol), or (iv) one in which the 
additional amino acids are fused to the mature polypeptide, such as a leader or 
secretory sequence or a sequence which is employed for purification of the mature 
polypeptide or a proprotein sequence. Such fragments, derivatives and analogs are 

30 deemed to be within the scope of those skilled in the art from the teachings herein. 

Among the particularly preferred embodiments of the invention in this regard 
are polypeptides set forth in the Sequence Listing, variants, analogs, derivatives and 
fragments thereof, and variants, analogs and derivatives of the fragments. 
Additionally, fusion polypeptides comprising such polypeptides, variants, analogs, 

35 derivatives and fragments thereof, and variants, analogs and derivatives of the 
fragmants, in addition to a heterologous polypeptide, are contemplated by the 



WO 01/34809 



PCTAJS00/30782 



18 

present invention . Such fusion polypeptides and proteins, as well as 
polynucleotides encoding them, can readily be made using standard techniques, 
including standard recombinant techniques for producing and expressing a 
recombinant polynucleic acid encoding a fusion protein. 
5 Anions j£r 'erred variants are those that vary from a reference by 

conservative amino acid substitutions. Such substitutions are those that substitute a 
given amino acid in a polypeptide by another amino acid of like characteristics. 
Typically seen as conservative substitutions are the replacements, one for another, 
among the aliphatic amino acids Ala, Val, Leu and He; interchange of the hydroxyl 

10 residues Ser and Thr, exchange of the acidic residues Asp and Glu, substitution 
between the amide residues Asn and Gin, exchange of the basic residues Lys and 
Arg and replacements among the aromatic residues Phe, Tyr. 

Further particularly preferred in this regard are variants, analogs, derivatives 
and fragments, and variants, analogs and derivatives of the fragments, having the 

15 amino acid sequence of any polypeptide aet forth ing the Sequence Listing, in which 
several, a few, 5 to 10, 1 to 5, 1 to 3, 2, 1 or no amino acid residues are substituted, 
deleted or added, in any combination. Especially preferred among these are silent 
substitutions, additions and deletions, which do not alter the properties and activities 
of the polypeptide of the present invention. Also especially preferred in this regard 

20 are conservative substitutions. Most highly preferred are polypeptides having an 
amino acid sequence set forth in the Sequence Listing without substitutions. 

The polypeptides and polynucleotides of the present invention are preferably 
provided in an isolated form, and preferably are purified to homogeneity. 

The polypeptides of the present invention include any polypeptide set forth in 

25 the Sequence Listing (in particular a mature polypeptide) as well as polypeptides 
which have at least 70% identity to a polypeptide set forth in the Sequence Listing, 
preferably at least 80% or 85% identity to a polypeptide set forth in the Sequence 
Listing, and more preferably at least 90% similarity (more preferably at least 90% 
identity) to a polypeptide set forth in the Sequence Listing and still more preferably at 

30 least 95%, 96%, 97%, 98%, 99%, or 99.5% similarity (still more preferably at least 
95%, 96%, 97%, 98%, 99%, or 99.5% identity) to a polypeptide set forth in the 
Sequence Listing and also include portions of such polypeptides with such portion of 
the polypeptide generally containing at least 30 amino acids and more preferably at 
least 50 amino acids, such as 30, 35, 40, 45 or 50 amino acids. 

35 Fragments or portions of the polypeptides of the present invention may be 

employed for producing the corresponding full-length polypeptide by peptide 
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synthesis; therefore, the fragments may be employed as intermediates for producing 
the full-length polypeptides. Fragments or portions of the polynucleotides of the 
present invention may be used to synthesize full-length polynucleotides of the 
present invention. 
5 Fragments 

Also among preferred embodiments of this aspect of the present invention 
are polypeptides comprising fragments of the polypeptide having the amino acid 
sequence set forth in the Sequence Listing, and fragments of variants and 
derivatives of the polypeptides set forth in the Sequence Listing. 

10 In this regard a fragment is a polypeptide having an amino acid sequence 

that entirely is the same as part but not all of the amino acid sequence of the 
aforementioned S. epidermidis polypeptides and variants or derivatives thereof. 

Such fragments may be "free-standing," i.e., not part of or fused to other 
amino acids or polypeptides, or they may be comprised within a larger polypeptide of 

15 which they form a part or region. When comprised within a larger polypeptide, the 
presently discussed fragments most preferably form a single continuous region. 
However, several fragments may be comprised within a single larger polypeptide. 
For instance, certain preferred embodiments relate to a fragment ofa polypeptide of 
the present invention comprised within a precursor polypeptide designed for 

20 expression in a host and having heterologous pre and pro-polypeptide regions fused 
to the amino terminus of the fragment and an additional region fused to the carboxyl 
terminus of the fragment. Therefore, fragments in one aspect of the meaning 
intended herein, refers to the portion or portions of a fusion polypeptide or fusion 
protein derived from a polypeptide of the present invention. 

25 Representative examples of polypeptide fragments of the invention, include, 

for example, in any selected polypeptide, fragments from about amino acid number 
1-20, 21-40, 41-60, 61-80, 81-100, and 101-200, 201-300, or, at the COOH-terminal 
end, the C-terminal 20 amino acids, the C-terminal 30 amino acids, the C-terminal 40 
amino acids, the C-terminal 50 amino acids, and any combination of these 

30 fragments, such as fragment from about amino acid number 1-40, 1-60, 21-60, 41- 
80,61-100, and the like. 

In this context "about" herein includes the particularly recited ranges larger or 
smaller by several, a few, 5, 4, 3, 2 or 1 amino acid at either extreme or at both 
extremes. 

35 Preferred fragments of the invention include, for example, truncation 

polypeptides including polypeptides having an amino acid sequence set forth in the 
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Sequence Listing, or of variants or derivatives thereof, except for deletion of a 
continuous series of residues (that is, a continuous region, part or portion) that 
includes the amino terminus, or a continuous series of residues that includes the 
carboxyl terminus or, as in double truncation mutants, deletion of two continuous 
5 series of residues, one including the amino terminus and one including the carboxyl 
terminus. Fragments having the size ranges set out above also are preferred 
embodiments of truncation fragments, which are especially preferred among 
fragments generally. Degradation forms of the polypeptides of the invention in a 
host cell are also preferred. 

10 Also preferred in this aspect of the invention are fragments characterized by 

structural or functional attributes of the polypeptide of the present invention 
Preferred embodiments of the invention in this regard include fragments that 
comprise alpha-helix and alpha-helix forming regions, beta-sheet and beta-sheet- 
forming regions, turn and turn-forming regions, coil and coil-forming regions, 

15 hydrophilic regions, hydrophobic regions, alpha amphipathic regions, beta 

amphipathic regions, flexible regions, surface-forming regions, substrate binding 
region, and high antigenic index regions of the polypeptide of the present invention, 
and combinations of such fragments. 

Preferred regions are those that mediate activities of the polypeptide of the 

20 present invention. Most highly preferred in this regard are fragments that have a 
chemical, biological or other activity of the polypeptide of the present invention, 
including those with a similar activity or an improved activity, or with a decreased 
undesirable activity. Particularly preferred are fragments comprising receptors or 
domains of enzymes that confer a function essential for viability of S.epidermidis or 

25 the ability to cause disease in humans. Further preferred polypeptide fragments are 
those that comprise or contain antigenic or immunogenic determinants in an animal, 
especially in a human. 

It will be appreciated that the invention also relates to, among others, 
polynucleotides encoding the aforementioned fragments, polynucleotides that 

30 hybridize to polynucleotides encoding the fragments, particularly those that hybridize 
under stringent conditions, and polynucleotides, such as PCR primers, for amplifying 
polynucleotides that encode the fragments. In these regards, preferred 
polynucleotides are those that correspond to the preferred fragments, as discussed 
above. 
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Vectors, host cells, expression 

The present invention also relates to vectors which comprise a 
polynucleotide or polynucleotides of the present invention, host cells which are 
genetically engineered with vectors of the invention and the production of 

5 polypeptides of the invention by recombinant techniques. 

Host cells can be genetically engineered to incorporate polynucleotides and 
express polypeptides of the present invention. Introduction of a polynucleotides into 
the host cell can be efFected by calcium phosphatetransfection, DEAE-dextran 
mediated transfection, transvection, microinjection, cationic lipid-mediated 

i o transfection, electroporation, transduction, scrape loading, ballistic introduction, 
infection or other methods. Such methods are described in many standard 
laboratory manuals, such as Davis et at, BASIC METHODS IN MOLECULAR 
BIOLOGY, (1986) and Sambrook et al., MOLECULAR CLONING: A LABORATORY 
MANUAL, 2nd Ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. 

15 (1989). 

Polynucleotide constructs in host cells can be used in a conventional manner 
to produce the gene product encoded by the recombinant sequence. Alternatively, 
the polypeptides of the invention can be synthetically produced by conventional 
peptide synthesizers. 

20 Mature proteins can be expressed in mammalian cells, yeast, bacteria, or 

other cells under the control of appropriate promoters. Cell-free translation systems 
can also be employed to produce such proteins using RNAs derived from the DNA 
constructs of the present invention. Appropriate cloning and expression vectors for 
use with prokaryotic and eukaryotic hosts are described by Sambrook et al., 

25 MOLECULAR CLONING: A LABORATORY MANUAL, 2nd Ed., Cold Spring Harbor 
Laboratory Press, Cold Spring Harbor, N.Y. (1989). 

In accordance with this aspect of the invention the vector may be, for 
example, a plasmid vector, a single or double-stranded phage vector, a single or 
double-stranded RNA or DNA viral vector. Plasmids generally are designated herein 

30 by a lower case p preceded and/or followed by capital letters and/or numbers, in 

accordance with standard naming conventions that are familiar to those of skill in the 
art. Starting plasmids disclosed herein are either commercially available, publicly 
available, or can be constructed from available plasmids by routine application of 
well known, published procedures, given the teachings herein. Many plasmids and 

35 other cloning and expression vectors that can be used in accordance with the 
present invention are well known and readily available to those of skill in the art. 
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Preferred among vectors, in certain respects, are those for expression of 
polynucleotides and polypeptides of the present invention. Generally, such vectors 
comprise c/s-acting control regions effective for expression in a host operatively 
linked to the polynucleotide to be expressed. Appropriate frans-acting factors either 

5 are supplied by the host, supplied by a complementing vector or supplied by the 
vector itself upon introduction into the host. 

In certain preferred embodiments in this regard, the vectors provide for 
specific expression. Such specific expression may be inducible expression or 
expression only in certain types of cells or both inducible and cell-specific. 

10 Particularly preferred among inducible vectors are vectors that can be induced for 
expression by environmental factors that are easy to manipulate, such as 
temperature and nutrient additives. A variety of vectors suitable to this aspect of the 
invention, including constitutive and inducible expression vectors for use in 
prokaryotic and eukaryotic hosts, are well known and employed routinely by those of 

15 skill in the art. 

A great variety of expression vectors can be used to express a polypeptide of 
the invention. Such vectors include, among others, chromosomal, episomal and 
virus-derived vectors, e.g., vectors derived from bacterial plasmids, from 
bacteriophage, from transposons, from yeast episomes, from insertion elements, 

20 from yeast chromosomal elements, from viruses such asbaculoviruses, papova 
viruses, such as SV40, vaccinia viruses, adenoviruses, fowl pox viruses, 
pseudorabies viruses and retroviruses, and vectors derived from combinations 
thereof, such as those derived fromplasmid and bacteriophage genetic elements, 
such as cosmids and phagemids, all may be used for expression in accordance with 

25 this aspect of the present invention. Generally, any vector suitable to maintain, 
propagate or express polynucleotides to express a polypeptide in a host may be 
used for expression in this regard. 

The appropriate DNA sequence may be inserted into the vector by any of a 
variety of well-known and routine techniques, such as, for example, those set forth in 

30 Sambrook et a/., MOLECULAR CLONING, A LABORATORY MANUAL, 2nd Ed.; 
Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York (1989). 

The DNA sequence in the expression vector is operatively linked to 
appropriate expression control sequence(s), including, for instance, a promoter to 
direct mRNA transcription. Representatives of such promoters include, but are not 

35 limited to, the phage lambda PL promoter, theE. coli lac, trp and tac promoters, the 
SV40 early and late promoters and promoters of retroviral LTRs. 
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In general, expression constructs will contain sites for transcription initiation 
and termination, and, in the transcribed region, aribosome binding site for 
translation. The coding portion of the mature transcripts expressed by the constructs 
will include a translation initiating AUG at the beginning and a termination codon 
5 appropriately positioned at the end of the polypeptide to be translated. 

In addition, the constructs may contain control regions that regulate as well 
as engender expression. Generally, in accordance with many commonly practiced 
procedures, such regions will operate by controlling transcription, such as 
transcription factors, repressor binding sites and termination, among others. 

10 Vectors for propagation and expression generally will include selectable 

markers and amplification regions, such as, for example, those set forth inSambrook 
et a/., MOLECULAR CLONING, A LABORATORY MANUAL 2nd Ed.; Cold Spring 
Harbor Laboratory Press, Cold Spring Harbor, New York (1989). 

Representative examples of appropriate hosts include bacterial cells, such as 

15 streptococci, staphylococci, E. co//, streptomyces and Bacillus subtilis cells; fungal 
cells, such as yeast cells and Aspergillus cells; insect cells such as Drosophila S2 
and Spodoptera Sf9 cells; animal cells such as CHO.COS, HeLa, C127, 3T3, BHK, 
293 and Bowes melanoma cells; and plant cells. 

The following vectors, which are commercially available, are provided by way 

20 of example. Among vectors preferred for use in bacteria are pQE70, pQE60 and 
pQE-9, available from Qiagen; pBS vectors, Phagescript vectors, Bluescript vectors, 
pNH8A, pNH16a, pNH18A, pNH46A, available from Stratagene; and ptrc99a, 
pKK223-3, pKK233-3, pDR540, pRIT5 available from Pharmacia, and pBR322 
(ATCC 37017). Among preferred eukaryotic vectors are pWLNEO, pSV2CAT, 

25 pOG44, pXT1 and pSG available from Stratagene; and pSVK3, pBPV, pMSG and 
pSVL available from Pharmacia. These vectors are listed solely by way of illustration 
of the many commercially available and well known vectors that are available to 
those of skill in the art for use in accordance with this aspect of the present invention. 
It will be appreciated that any otherplasmid or vector suitable for, for example, 

30 introduction, maintenance, propagation or expression of a polynucleotide or 

polypeptide of the invention in a host may be used in this aspect of the invention. 

Promoter regions can be selected from any desired gene using vectors that 
contain a reporter transcription unit lacking a promoter region, such as a 
chloramphenicol acetyl transferase ("CAT") transcription unit, downstream of 

35 restriction site or sites for introducing a candidate promoter fragment; i.e., a fragment 
that may contain a promoter. As is well known, introduction into the vector of a 
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promoter-containing fragment at the restriction site upstream of thecaf gene 
engenders production of CAT activity, which can be detected by standard CAT 
assays. Vectors suitable to this end are well known and readily available, such as 
pKK232-8 and pCM7. Promoters for expression of polynucleotides of the present 
5 invention include not only well known and readily available promoters, but also 
promoters that readily may be obtained by the foregoing technique, using a reporter 
gene. 

Among known prokaryotic promoters sulable for expression of 
polynucleotides and polypeptides in accordance with the present invention are theE 

10 co// lad and lacZ and promoters, the T3 and T7 promoters, thegpf promoter, the 
lambda PR, PL promoters and thetrp promoter. 

Among known eukaryotic promoters suitable in this regard are the CMV 
immediate early promoter, the HSVthymidine kinase promoter, the early and late 
SV40 promoters, the promoters of retroviral LTRs, such as those of the Rous 

15 sarcoma virus fRSV") t and metallothionein promoters, such as the mouse 
metallothk>nein-l promoter. 

Recombinant expression vectors will include, for example, origins of 
replication, a promoter preferably derived from a highly-expressed gene to direct 
transcription of a downstream structural sequence, and a selectable marker to permit 

20 isolation of vector containing cells after exposure to the vector. 

Polynucleotides of the invention, encoding the heterologous structural 
sequence of a polypeptide of the invention generally will be inserted into the vector 
using standard techniques so that it isoperably linked to the promoter for expression. 
The polynucleotide will be positioned so that the transcription start site is located 

25 appropriately 5' to a ribosome binding site. Theribosome binding site will be 5' to 
the AUG that initiates translation of the polypeptide to be expressed. Generally, 
there will be no other open reading frames that begin with an initiation codon, 
usually AUG, and lie between the ribosome binding site and the initiation codon. 
Also, generally, there will be a translation stopcodon at the end of the polypeptide 

30 and there will be a polyadenylatton signal in constructs for use in eukaryotic hosts. 
Transcription termination signal appropriately disposed at the 3' end of the 
transcribed region may also be included in the polynucleotide construct. 

For secretion of the translated protein into the lumen of the endoplasmic 
reticulum, into the periplasms space or into the extracellular environment, 

35 appropriate secretion signals may be incorporated into the expressed polypeptide. 
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These signals may be endogenous to the polypeptide or they may beheterologous 
signals. 

The polypeptide may be expressed in a modified form, such as a fusion 
protein, and may include not only secretion signals but also additional heterologous 
5 functional regions. Thus, for instance, a region of additional amino acids, particularly 
charged amino acids, may be added to the N- or C-terminus of the polypeptide to 
improve stability and persistence in the host cell, during purification or during 
subsequent handling and storage. Also, regions also may be added to the 
polypeptide to facilitate purification. Such regions may be removed prior to final 

10 preparation of the polypeptide. The addition of peptide moieties to polypeptides to 
engender secretion or excretion, to improve stability or to facilitate purification, 
among others, are familiar and routine techniques in the art. A preferred fusion 
protein comprises a heterologous region from immunoglobulin that is useful to 
solubilize or purify polypeptides. For example, EP-A-O 464 533 (Canadian 

15 counterpart 2045869) discloses fusion proteins comprising various portions of 
constant region of immunoglobin molecules together with another protein or part 
thereof. In drug discovery, for example, proteins have been fused with antibody 
Fc portions for the purpose of high-throughput screening assays to identify 
antagonists. See, D. Bennett et al., Journal of Molecular Recognition, Vol. 8 52-58 

20 (1995) and K. Johanson et al., The Journal of Biological Chemistry, Vol. 270, No. 
16, pp 9459-9471 (1995). 

Cells typically then are harvested by centrifugation, disrupted by physical or 
chemical means, and the resulting crude extract retained for further purification. 

Microbial cells employed in expression of proteins can be disrupted by any 

25 convenient method, including freeze-thaw cycling, sonication, mechanical disruption, 
or use of cell lysing agents; such methods are well known to those skilled in the art. 

Mammalian expression vectors may comprise expresssion sequences, such 
as an origin of replication, a suitable promoter and enhancer, and also any 
necessary ribosome binding sites, polyadenylation regions, splice donor and 

30 acceptor sites, transcriptional termination sequences, and 5' flanking non-transcribed 
sequences that are useful or necessary for expression. 

The polypeptide can be recovered and purified from recombinant cell 
cultures by well-known methods including ammonium sulfate or ethanol precipitation, 
acid extraction, anion or cation exchange chromatography, phosphocellulose 

35 chromatography, hydrophobic interaction chromatography, affinity chromatography, 
hydroxylapatite chromatography and lectin chromatography. Most preferably, high 
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performance liquid chromatography is employed for purification. Well known 
techniques for refolding protein may be employed to regenerate active conformation 
when the polypeptide is denatured during isolation and or purification. 
Polynucleotide assays 
5 This invention is also related to the use of the polynucleotides of the present 

invention to detect complementary polynucleotides such as, for example, as a 
diagnostic reagent. Detection of complementary nucleotides ina eukaryote, 
particularly a mammal, and especially a human, will provide a diagnostic method for 
diagnosis of a disease. Eukaryotes (herein also "individuals)"), particularly 

10 mammals, and especially humans, infected with S. epidermidis may be detected at 
the DNA level by a variety of techniques. By selecting regions of nucleic acids that 
vary among strains of S. epidermidis, preferred candidates for distinguishing a 
specific strain of S. epidermidis can be obtained. Furthermore, by selecting regions 
of nucleic acids that vary between S. epidermidis and other organisms, preferred 

15 candidates for distinguishing a S. epidermidis from other organisms can be obtained. 
Nucleic acids for diagnosis may be obtained from an infected individual's cells and 
tissues, such as bone, blood, muscle, cartilage, and skin. Genomic DNA may be 
used directly for detection or may be amplified enzymatically by using PCR (Saiki et 
aL, Nature, 324: 163-1 66 (1 986) prior to analysis. RNA orcDNA may also be used 

20 in the same ways. As an example, PCR primers complementary to the nucleic acid 
forming part of the polynucleotide of the present invention can be used to identify 
and analyze for its presence and/or expression. Using PCR, characterization of the 
strain of S. epidermidis present in a mammal, and especially a human, may be made 
by an analysis of the genotype of the prokaryote gene. For example, deletions and 

25 insertions can be detected by a change in size of the amplified product in 

comparison to the genotype of a reference sequence. Point mutations can be 
identified by hybridizing amplified DNA to radiolabeled RNA or alternatively, 
radiolabeled antisense DNA sequences. Perfectly matched sequences can be 
distinguished from mismatched duplexes by RNase A digestion or by differences in 

30 melting temperatures. 

Sequence differences between a reference gene and genes having 
mutations also may be revealed by direct DNA sequencing. In addition, cloned DNA 
segments may be employed as probes to detect specific DNA segments. The 
sensitivity of such methods can be greatly enhanced by appropriate use of PCR or 

35 another amplification method. For example, a sequencing primer can be used with 
double-stranded PCR product or a single-stranded template molecule generated by 
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a modified PCR The sequence determination is performed by conventional 
procedures with radiolabeled nucleotide or by automatic sequencing procedures with 
fluorescent-tags. 

Genetic characterization based on DNA sequence differences may be 

5 achieved by detection of alteration in electrophoretic mobility of DNA fragments in 
gels, with or without denaturing agents. Small sequence deletions and insertions 
can be visualized by high resolution gel electrophoresis. DNA fragments of different 
sequences may be distinguished on denaturing formamide gradient gels in which the 
mobilities of different DNA fragments are retarded in the gel at different positions 

10 according to their specific melting or partial melting temperatures (see.e.g., Myers et 
al., Scfence, 230: 1242 (1985)). 

Sequence changes at specific locations also may be revealed by nuclease 
protection assays, such as RNase and S1 protection or the chemical cleavage 
method (e.g., Cotton et al., Pmc. Natl. Acad Sci., USA, 85:4397-4401 (1985)). 

15 Thus, the detection of a specific DNA sequence may be achieved by 

methods such as hybridization, RNase protection, chemical cleavage, direct DNA 
sequencing or the use of restriction enzymes, e.g., restriction fragment length 
polymorphisms (RFLP) and Southern blotting of genomic DNA. 

In addition to more conventional gel-electrophoresis and DNA sequencing, 

20 mutations also can be detected by in situ analysis. 

Cells carrying mutations or polymorphisms in the gene of the present 
invention may also be detected at the DNA level by a variety of techniques, to allow 
for serotyping, for example. For example, RT-PCR can be used to detect mutations. 
It is particularly preferred to use RT-PCR in conjunction with automated detection 

25 systems, such as, for example, GeneScan. RNA or cDNA may also be used for the 
same purpose, PCR or RT-PCR. As an example, PCR primers complementary to 
the nucleic add encoding the polypeptide of the present invention can be used to 
identify and analyze mutations. The primers may be used to amplify the gene 
isolated from the individual such that the gene may then be subject to various 

30 techniques for elucidation of the DNA sequence. In this way, mutations in the DNA 
sequence may be diagnosed. 

The invention provides a process for diagnosing disease, arising from 
infection with S. epidermidis, comprising determining from a sample isolated or 
derived from an individual an increased level of expression of a polynucleotide 

35 having the sequence of a polynucleotide set forth in the Sequence Listing. 
Increased expression of polynucleotide can be measured using any on of the 
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methods well known in the art for the quantitation of polynucleotides, such as, for 
example, PCR, RT-PCR, RNase protection, Northern blotting and other 
hybridization methods. 

Polypeptide assays 

5 The present invention also relates to diagnostic assays such as quantitative 

and diagnostic assays for detecting levels of the polypeptide of the present invention 
in cells and tissues, including determination of normal and abnormal levels. Thus, 
for instance, a diagnostic assay in accordance with the invention for detecting over- 
expression of the polypeptide compared to normal control tissue samples may be 

10 used to detect the presence of an infection, for example, and to identify the infecting 
organism. Assay techniques that can be used to determine levels of a polypeptide, 
in a sample derived from a host are well-known to those of skill in the art. Such 
assay methods include radioimmunoassays, competitive-binding assays, Western 
Blot analysis and ELISA assays. Among these, ELISAs frequently are preferred. An 

15 ELISA assay initially comprises preparing an antibody specific to the polypeptide, 
preferably a monoclonal antibody. In addition, a reporter antibody generally is 
prepared which binds to the monoclonal antibody. The reporter antibody is attached 
to a detectable reagent such as radioactive, fluorescent or enzymatic reagent, such 
as horseradish peroxidase enzyme. 

20 Antibodies 

The polypeptides, their fragments or other derivatives, or analogs thereof, or 
cells expressing them can be used as animmunogen to produce antibodies thereto. 
The present invention includes, for example, monoclonal and polyclonal antibodies, 
chimeric, single chain, and humanized antibodies, as well as Fab fragments, or the 

25 product of an Fab expression library. 

Antibodies generated against the polypeptides corresponding to a sequence 
of the present invention can be obtained by direct injection of the polypeptides into 
an animal or by administering the polypeptides to an animal, preferably anonhuman. 
The antibody so obtained will then bind the polypeptide itself. In this manner, even a 

30 sequence encoding only a fragment of the polypeptides can be used to generate 
antibodies binding the whole native polypeptides. Such antibodies can then be used 
to isolate the polypeptide from tissue expressing that polypeptide. 

For preparation of monoclonal antibodies, any technique known in the art 
which provides antibodies produced by continuous cell line cultures can be used. 

35 Examples include various techniques, such as those in Kohler, G. and Milstein, C, 
Nature 256: 495-497 (1975); Kozbor et aA, Immunology Today 4: 72 (1 983); Cole et 
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al.. pg. 77-96 in MONOCLONAL ANTIBODIES AND CANCER THERAPY, Alan R. 
Liss, fnc. (1985); U.S. Patent No. 5,545,403; U.S. Patent No. 5,545,405; U.S. Patent 
No. 5,654,403; U.S. Patent No. 5,792,838; U.S. Patent No. 5,316,938; U.S. Patent 
No. 5,633,162; U.S. Patent No. 5,644,036; U.S. Patent No. 5,858,725. 
5 Techniques described for the production of single chain antibodies (U.S. 

Patent No. 4,946,778) can be adapted to produce single chain antibodies to 
immunogenic polypeptide products of this invention. Also, transgenic mice, or other 
organisms such as other mammals, may be used to express humanized antibodies 
to immunogenic polypeptide products of this invention. 

10 Alternatively, phage display technology could be utilized to select antibody 

genes with binding activities towards the polypeptide either from repertoires of 
PCR amplified v-genes of lymphocytes from humans screened for possessing 
anti-Fbp or from naive libraries (McCafferty, J. et al., (1990), Nature 348, 552-554; 
Marks, J. et al., (1992) Biotechnology 10 t 779-783). The affinity of these 

15 antibodies can also be improved by chain shuffling (Clackson, T. et al., (1991) 
Nature 352, 624-628). 

If two antigen binding domains are present, each domain may be directed 
against a different epitope - termed 'bispecific' antibodies. 

The above-described antibodies may be employed to isolate or to identify 

20 clones expressing the polypeptide or purify the polypeptide of the present invention 
by attachment of the antibody to a solid support for isolation and/or purification by 
affinity chromatography. 

Thus, among others, antibodies against the polypeptide of the present 
invention may be employed to inhibit and/or treat infections, particularly bacterial 

25 infections and especially infections arising from S.epidermidis. 

Polypeptide derivatives include antigenically, epitopically or 
immunologically equivalent derivatives which form a particular aspect of this 
invention. The term "antigenically equivalent derivative" as used herein 
encompasses a polypeptide or its equivalent which will be specifically recognized 

30 by certain antibodies which, when raised to the protein or polypeptide according to 
the present invention, interfere with the immediate physical interaction between 
pathogen and mammalian host. The term "immunologically equivalent derivative" 
as used herein encompasses a peptide or its equivalent which when used in a 
suitable formulation to raise antibodies in a vertebrate, the antibodies act to 

35 interfere with the immediate physical interaction between pathogen and 
mammalian host. 
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The polypeptide, such as an antigenically or immunologically equivalent 
derivative or a fusion protein thereof can be used as an antigen to immunize a 
mouse or other animal such as a rat or chicken. The fusion protein may provide 
stability to the polypeptide. The antigen may be associated, for example by 
5 conjugation, with an immunogenic carrier protein, for example bovine serum 
albumin (BSA) or keyhole limpet haemocyanin (KLH). Alternatively, a multiple 
antigenic peptide comprising multiple copies of the protein or polypeptide, or an 
antigenically or immunologically equivalent polypeptide thereof, may be 
sufficiently antigenic to improve immunogenicity so as to obviate the use of a 
10 carrier. 

Preferably the antibody or derivative thereof is modified to make it less 
immunogenic in the individual. For example, if the individual is human the 
antibody may most preferably be "humanized," wherein the complimentarity 
determining region(s) of the hybridoma-derived antibody has been transplanted 
15 into a human monoclonal antibody, for example as described in Jones, P. et al. 
(1986), Nature 321, 522-525 or Tempest et al.,(1991) Biotechnology 9, 266-273. 

The use of a polynucleotide of the invention in genetic immunization will 
preferably employ a suitable delivery method such as direct injection of plasmid 
DNA into muscle (Wolff et al., Hum Mol Genet 1992, 1:363, Manthorpe et al. a 
20 Hum. Gene Ther. 1963:4, 419), delivery of DNA complexed with specific protein 
carriers (Wu et al., J Biol Chem 1989:264,16985), coprecipitation of DNA with 
calcium phosphate (Benvenisty & Reshef, PNAS, 1986:83,9551), encapsulation of 
DNA in various forms of liposomes (Kaneda et al., Science 1989:243,375), 
particle bombardment (Tang et al., Nature 1992, 356:152, Eisenbraun et al., DNA 
25 Cell Biol 1993, 12:791) and in vivo infection using cloned retroviral vectors 
(Seegeret al., PNAS 1984:81,5849). 

Binding molecules and assays 

This invention also provides a method for identification of molecules, such as 
binding molecules, that bind to the polypeptide of the present invention. Genes 

30 encoding proteins that bind to the polypeptide can be identified by numerous 
methods known to those of skill in the art, for example, ligand panning and FACS 
sorting. Such methods are described in many laboratory manuals such as, for 
instance, Coligan et a/., Current Protocols in Immunology f (2J;Chapter 5 (1991). 
Also, a labeled ligand can be photoaffinity linked to a cell extract. Polypeptides of 

35 the invention also can be used to assess the binding capacity of a binding molecule, 
in cells or in cell-free preparations. 
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Polypeptides of the invention may also be used to assess the binding or 
small molecule substrates and ligands in, for example, cells, cell-free preparations, 
chemical libraries, and natural product mixtures. These substrates andligands may 
be natural substrates and ligands or may be structural or functional mimetics. 
5 The invention further provides a complex of a pdypeptide and a binding 

molecule which comprises a polypeptide as described herein and a binding molecule 
capable of modulating the activity of the polypeptide. A complex of this kind can 
arise in vivo upon administration to a patient of a binding molecule as described 
herein. 

10 Antagonists and agonists - assays and molecules 

The invention also provides a method of screening compounds to identify 
those which enhance (agonist) or block (antagonist) the function of polypeptides or 
polynucleotides of the present invention, such as its interaction with a binding 
molecule. The method of screening may involve high-throughput. 

15 For example, to screen for agonists or antagonists, a synthetic reaction mix, 

a cellular compartment, such as a membrane, cell envelope or cell wall, or a 
preparation of any thereof, may be prepared from a cell that expresses a molecule 
that binds to the polypeptide of the present invention. The preparation is incubated 
with labeled polypeptide in the absence or the presence of a candidate molecule 

20 which may be an agonist or antagonist. The ability of the candidate molecule to bind 
the binding molecule is reflected in decreased binding of the labeled ligand. 
Molecules which bind gratuitously, i.e., without inducing the functional effects of the 
polypeptide, are most likely to be good antagonists. Molecules that bind well and 
elicit functional effects that are the same as or closely related to the polypeptide are 

25 good agonists. 

The functional effects of potential agonists and antagonists may by 
measured, for instance, by determining activity of a reporter system following 
interaction of the candidate molecule with a cell or appropriate cell preparation, and 
comparing the effect with that of the polypeptide of the present invention or 

30 molecules that elicit the same effects as the polypeptide. Reporter systems that may 
be useful in this regard include but are not limited tocolorimetric labeled substrate 
converted into product, a reporter gene that is responsive to changes in the 
functional activity of the polypeptide, and binding assays known in the art. 

Another example of an assay for antagonists is a competitive assay that 

35 combines the polypeptide of the present invention and a potential antagonist with 
membrane-bound binding molecules, recombinant binding molecules, natural 
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substrates or ligands, or substrate or ligand mimetics, under appropriate conditions 
for a competitive inhibition assay. The polypeptide can be labeled such as by 
radioactivity or a colorimetric compound, such that the number of polypeptide 
molecules bound to a binding molecule or converted to product can be determined 
5 accurately to assess the effectiveness of the potential antagonist. 

Potential antagonists include small organic molecules, peptides, polypeptides 
and antibodies that bind to a polypeptide of the invention and thereby inhibit or 
extinguish its activity. Potential antagonists also may be small organic molecules, a 
peptide, a polypeptide such as a closely related protein or antibody that binds to the 

10 same sites on a binding molecule without inducing functional activity of the 
polypeptide of the invention. 

Potential antagonists include a small molecule which binds to and occupies 
the binding site of the polypeptide thereby preventing binding to cellular binding 
molecules, such that normal biological activity is prevented. Examples of small 

15 molecules include but are not limited to small organic molecules, peptides or peptide- 
like molecules. 

Other potential antagonists include antisense molecules (see Okano, J. 
Neumchem. 56: 560 (1991); OLIGODEOXYNUCLEOTIDES AS ANTISENSE 
INHIBITORS OF GENE EXPRESSION CRC Press, Boca Raton, FL (1988), for a 
20 description of these molecules). 

Preferred potential antagonists include derivatives of the polypeptide of the 
invention. 

In a particular aspect, the invention provides the use of the polypeptide, 
polynucleotide or inhibitor of the invention to interfere with the initial physical 

25 interaction between a pathogen and mammalian host responsible for sequelae of 
infection. In particular the molecules of the invention may be used: i) in the 
prevention of adhesion of S. epidermidis to mammalian extracellular matrix 
proteins on in-dwelling devices or to extracellular matrix proteins in wounds; ii) to 
block protein mediated mammalian cell invasion by, for example, initiating 

30 phosphorylation of mammalian tyrosine kinases (Rosenshine et a/., Infect. Immun. 
60:221 1 (1992)); iii) to block bacterial adhesion between mammalian extracellular 
matrix proteins and bacterial proteins which mediate tissue damage; iv) to block 
the normal progression of pathogenesis in infections initiated other than by the 
implantation of in-dwelling devices or by other surgical techniques. 

35 Each of the DNA coding sequences provided herein may be used in the 

discovery and development of antibacterial compounds. The encoded protein 
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upon expression can be used as a target for the screening of antibacterial drugs. 
Additionally, the DNA sequences encoding the amino terminal regions of the 
encoded protein or Shine-Delgarno or other translation facilitating sequences of 
the respective mRNA can be used to construct antisense sequences to control the 
5 expression of the coding sequence of interest. 

The antagonists and agonists may be employed, for instance, to inhibit 
diseases arising from infection with Staphylococcus, especiallyS. epidermidis, such 
as sepsis and endocarditis. 

Vaccines 

10 Another aspect of the invention relates to a method for inducing an 

immunological response in an individual, particularly a mammal, which comprises 
inoculating the individual with the polypeptide of the invention, or a fragment or 
variant thereof, adequate to produce antibody to protect said individual from 
infection, particularly bacterial infection and most particularly Staphylococcus 

15 infections. Yet another aspect of the invention relates to a method of inducing 
immunological response in an individual which comprises, through gene therapy 
or otherwise, delivering a nucleic acid functionally encoding the polypeptide, or a 
fragment or a variant thereof, for expressing the polypeptide, or a fragment or a 
variant thereof in vivo in order to induce an immunological response to produce 

20 antibodies or a cell mediated T cell response, either cytokine-producing T cells or 
cytotoxic T cells, to protect said individual from disease, whether that disease is 
already established within the individual or not. One way of administering the 
gene is by accelerating it into the desired cells as a coating on particles or 
otherwise. 

25 A further aspect of the invention relates to an immunological composition 

which, when introduced into a host capable of having induced within it an 
immunological response, induces an immunological response in such host, 
wherein the composition comprises recombinant DNA which codes for and 
expresses an antigen of the polypeptide of the present invention. The 

30 immunological response may be used therapeutically or prophylactically and may 
take the form of antibody immunity or cellular immunity such as that arising from 
CTL or CD4+ T cells. 

The polypeptide of the invention or a fragment thereof may be fused with 
co-protein which may not by itself produce antibodies, but is capable of stabilizing 

35 the first protein and producing a fused protein which will have immunogenic and 
protective properties. This fused recombinant protein preferably further comprises 



WO 01/34809 



PCT/US00/30782 



34 

an antigenic co-protein, such as Glutathione-S-transferase (GST) or beta- 
galactosidase, relatively large co-proteins which solubilise the protein and 
facilitate production and purification thereof. Moreover, the co-protein may act as 
an adjuvant in the sense of providing a generalized stimulation of the immune 
5 system. The co-protein may be attached to either the amino or carboxy terminus 
of the first protein. 

Provided by this invention are compositions, particularly vaccine 
compositions, and methods comprising the polypeptides or polynucleotides of the 
invention and immunostimulatory DNA sequences, such as those described in 

10 Sato, Y. et a/. Science 273: 352 (1996). 

Also, provided by this invention are methods using the described 
polynucleotide or particular fragments thereof which have been shown to encode 
non-variable regions of bacterial cell surface proteins in DNA constructs used in 
such genetic immunization experiments in animal models of infection with S. 

15 epidermidis. Such fragments will be particularly useful for identifying protein 
epitopes able to provoke a prophylactic or therapeutic immune response. This 
approach can allow for the subsequent preparation of monoclonal antibodies of 
particular value from the requisite organ of the animal successfully resisting or 
clearing infection for the development of prophylactic agents or therapeutic 

20 treatments of S. epidermidis infection in mammals, particularly humans. 

The polypeptide may be used as an antigen for vaccination of a host to 
produce specific antibodies which protect against invasion of bacteria, for example 
by blocking adherence of bacteria to damaged tissue. Examples of tissue 
damage include wounds in skin or connective tissue caused e.g. by mechanical, 

25 chemical or thermal damage or by implantation of indwelling devices, or wounds 
in the mucous membranes, such as the mouth, mammary glands, urethra or 
vagina. 

The present invention also includes a vaccine formulation which comprises 
the immunogenic recombinant protein together with a suitable carrier. Since the 

30 protein may be broken down in the stomach, it is preferably administered 
parenterally, including, for example, administration that is subcutaneous, 
intramuscular, intravenous, or intradermal. Formulations suitable for parenteral 
administration include aqueous and non-aqueous sterile injection solutions which 
may contain anti-oxidants, buffers, bacteriostats and solutes which render the 

35 formulation isotonic with the bodily fluid, preferably the blood, of the individual; 
and aqueous and non-aqueous sterile suspensions which may include 
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suspending agents or thickening agents. The formulations may be presented in 
unit-dose or multi-dose containers, for example, sealed ampoules and vials, and 
may be stored in a freeze-dried condition requiring only the addition of the sterile 
liquid carrier immediately prior to use. The vaccine formulation may also include 
5 adjuvant systems for enhancing the immunogenicity of the formulation, such as 
oil-in-water systems and other systems known in the art. The dosage will depend 
on the specific activity of the vaccine and can be readily determined by routine 
experimentation. 

While the invention has been described with reference to certain 
10 polypeptides, it is to be understood that this covers fragments of the naturally 
occurring protein and similar proteins with additions, deletions or substitutions 
which do not substantially affect the immunogenic properties of the recombinant 
protein. 

Compositions 

15 The invention also relates to compositions comprising the polynucleotide or 

the polypeptides discussed above or the agonists or antagonists. Thus, the 
polypeptides of the present invention may be employed in combination with a non- 
sterile or sterile carrier or carriers for use with cells, tissues or organisms, such as a 
pharmaceutical carrier suitable for administration to a subject. Such compositions 

20 comprise, for instance, a media additive or a therapeutically effective amount of a 
polypeptide of the invention and a pharmaceutical^ acceptable carrier orexcipient. 
Such carriers may include, but are not limited to, saline, buffered saline, dextrose, 
water, glycerol, ethanol and combinations thereof. The formulation should suit the 
mode of administration. 

25 Kits 

The invention further relates to diagnostic and pharmaceutical packs and kits 
comprising one or more containers filled with one or more of the ingredients of the 
aforementioned compositions of the invention. The ingredient(s) can be present in a 
useful amount, dosage, formulation or combination. Associated with such 
30 container(s) can be a notice in the form prescribed by a governmental agency 
regulating the manufacture, use or sale of pharmaceuticals or biological products, 
reflecting approval by the agency of the manufacture, use or sale of the product for 
human administration. 
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Administration 

Polypeptides and other compounds of the present invention may be 
employed alone or in conjunction with other compounds, such as therapeutic 
compounds. 

5 The pharmaceutical compositions may be administered in any effective, 

convenient manner including, for instance, administration by topical, oral, anal, 
vaginal, intravenous, intraperitoneal, intramuscular, subcutaneous, intranasal or 
intradermal routes among others. 

The pharmaceutical compositions generally are administered in an amount 

10 effective for treatment or prophylaxis of a specific indication or indications In general, 
the compositions are administered in an amount of active agent of at least aboutIO 
pg/kg body weight. In most cases they will be administered in one or more doses in 
an amount not in excess of about 8 mg/kg body weight per day. Preferably, in most 
cases, dose is from about 10 pg/kg to about 1 mg/kg body weight, daily. For 

15 administration particularly to mammals, and particularly humans, it is expected that 
the daily dosage level of the active agent will be from 0.01 mg/kg to 10 mg/kg and 
typically around 1 mg/kg. For example, a dose may be 1 mg/kg daily.lt will be 
appreciated that optimum dosage will be determined by standard methods for each 
treatment modality and indication, taking into account the indication, its severity, 

20 route of administration, complicating conditions and the like. The physician in any 
event will determine the actual dosage which will be most suitable for an individual 
and will vary with the age, weight and response of the particular individual. The 
above dosages are exemplary of the average case. There can, of course, be 
individual instances where higher or lower dosage ranges are merited, and such 

25 are within the scope of this invention. 

In therapy or as a prophylactic, the active agent may be administered to an 
individual as an injectable composition, for example as a sterile aqueous 
dispersion, preferably isotonic. 

Alternatively the composition may be formulated for topical application, for 

30 example in the form of ointments, creams, lotions, eye ointments, eye drops, ear 
drops, mouthwash, impregnated dressings and sutures and aerosols, and may 
contain appropriate conventional additives, including, for example, preservatives, 
solvents to assist drug penetration, and emollients in ointments and creams. 
Such topical formulations may also contain compatible conventional carriers, for 

35 example cream or ointment bases, and ethanol or oleyl alcohol for lotions. Such 
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carriers may constitute from about 1% to about 98% by weight of the formulation; 
more usually they will constitute up to about 80% by weight of the formulation. 

The pharmaceutical composition may be administered in conjunction with 
an in-dwelling device. In-dwelling devices include surgical implants, prosthetic 
5 devices and catheters, i.e., devices that are introduced to the body of an individual 
and remain in position for an extended time. Such devices include, for example, 
artificial joints, heart valves, pacemakers, vascular grafts, vascular catheters, 
cerebrospinal fluid shunts, urinary catheters, continuous ambulatory peritoneal 
dialysis (CAPD) catheters, etc. 

10 The composition of the invention may be administered by injection to 

achieve a systemic effect against relevant bacteria shortly before insertion of an 
in-dwelling device. Treatment may be continued after surgery during the in-body 
time of the device. In addition, the composition could also be used to broaden 
perioperative cover for any surgical technique to prevent Staphyloccocus wound 

15 infections. 

Many orthopaedic surgeons consider that humans with prosthetic joints 
should be considered for antibiotic prophylaxis before dental treatment that could 
produce a bacteremia. Late deep infection is a serious complication sometimes 
leading to loss of the prosthetic joint and is accompanied by significant morbidity 
20 and mortality. It may therefore be possible to extend the use of the active agent 
as a replacement for prophylactic antibiotics in this situation. 

In addition to the therapy described above, the compositions of this 
invention may be used generally as a wound treatment agent to prevent adhesion 
of bacteria to matrix proteins exposed in wound tissue and for prophylactic use in 
25 dental treatment as an alternative to, or in conjunction with, antibiotic prophylaxis. 

Alternatively, the composition of the invention may be used to bathe an 
indwelling device immediately before insertion. The active agent will preferably be 
present at a concentration of Vg/ml to 10mg/ml for bathing of wounds or 
indwelling devices. 

30 A vaccine composition is conveniently in injectable form. Conventional 

adjuvants may be employed to enhance the immune response. A suitable unit 
dose for vaccination is 0.5-5^g/kg of antigen, and such dose is preferably 
administered 1-3 times and with an interval of 1-3 weeks. 

With the indicated dose range, no adverse toxicological effects should be 

35 observed with the compounds of the invention which would preclude their 
administration to suitable individuals. 
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The antibodies described above may also be used as diagnostic reagents 
to detect the presence of bacteria containing the protein. 

In order to facilitate understanding of the following example certain 
frequently occurring methods and/or terms are explained in the foregoing glossary. 
5 The present invention is further described by the following examples. While 

illustrating certain specific aspects of the invention, the examples do not portray the 
limitations or circumscribe the scope of the disclosed invention. 

All examples were carried out using routine molecular biology techniques as 
generally described in standard laboratory manuals, such asSambrook et al., 
10 MOLECULAR CLONING: A LABORATORY MANUAL 2nd Ed.; Cold Spring Harbor 
Laboratory Press, Cold Spring Harbor, N Y. (1989). 

EXAMPLES 

15 A small insert plasmid library was generated in the minimal sequencing 

vector pOT2a (O. Hubbard, C. Martin, and M. Palazzolo, unpublished). pOT2a 
vector was prepared by BstXI digestion of the parent plasmid pOT2a-sacB 
followed by preparative agarose gel electrophoresis to separate the 1.6 kb vector 
fragment from a B. subtilis sacB gene fragment. To prepare inserts for library 

20 construction S. epidermidis SR1 strain genomic DNA was sonicated and the 
resulting random fragments were end-repaired with klenow and T4 polymerase 
and phosphorylated with T4 polynucleotide kinase. Otigos (5'-CTCTAAAG-3\ 5- 
CTTTAGAGCACA-3') (SEQ ID NO.:4465) to create BstXI adaptors were 
annealed and ligated to the blunt-ended fragments. The configuration of the BstXI 

25 sites in pOT2a and the sequence of the adaptors allowed a ligation strategy that 
minimized the recovery of clones without insert (Seed, 1987). DNA samples were 
electrophoresed on a low-melting-temperature agarose gel and fragments of 
3000-4000 bp were isolated and purified. The linearized vector and random DNA 
fragments were ligated overnight using T4 DNA ligase at 16C and transformed 

30 into DH10B competent cells (Life Technologies Inc., Gaithersburg, MD) by 

electroporation. Transformed bacteria were selected on LB agar plates containing 
5% sucrose and 12.5 ug/ml chloramphenicol. Sequencing templates were 
isolated from single colonies and purified using R EAL. Prep 96 Plasmid Kit 
(QIAGEN, Chatsworth, CA). Seq01 primer (5'-CACTATAGAACTCGAGCAGCTG- 

35 3') (SEQ ID NO.:4466) and seq02 primer (5 -CGACTCACTATAGGGAGACCG-3') 
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(SEQ ID NO.:4467) were used to generate end-sequence using ABI Prism BigDye 
Terminators (PE Applied Biosystems, Foster City, CA). 

Constructs from the pOT2a library were transformed into POX38 bacteria 
and selected on LB agar plates containing 12.5 ug/ml chloramphenicol. A single 
5 colony from each construct was used to inoculate an overnight culture. These 
POX38 cultures were mated with a culture of the F-bearing, kanamycin resistant 
JGM strain by combining the two strains and shaking for 3 hours at 37C without 
antibiotics. Each successful mating event resulted in the random insertion of a 
single gamma-delta transposon into the pOT2a construct. This collection of 

10 transpositions was captured in the JGM cells by selection of the mated cultures on 
LB agar plates containing 12.5 ug/ml chloramphenicol and 25 ug/ml kanamycin. A 
transposon library was created for each of the original pOT2a library constructs by 
picking 96 individual colonies. A set of two PGR reactions was performed on each 
of the 96 library members to determine the position of the transposon integration. 

15 PM001 primer (5 CGTTAGAACGCGGCTACA-3') (SEQ ID NO.:4468) and 

NGDIR primer (5'-GTTCCATTGGCCCTCAAAC-3') (SEQ ID NO.:4469) were used 
to determine the integration site distance from the left side of the vector and 
PM002 primer (5'-GCCGATTCATTAATGCAGGT-3') (SEQ ID NO.:4470) and 
NGDIR primer were used to confirm the integration position by measuring the 

20 distance from the right side of the vector. PCR products were electrophoresed in 
1X TBE on 1 .4% agarose gels. After gel analysis, a subset of transposon clones 
was selected for sequencing based upon obtaining an integration site about every 
300 bp along the full length of the pOT2a insert. Sequencing templates were 
purified using R.E.A.L. Prep 96 Plasmid Kit (QIAGEN, Chatsworth, CA). M21 

25 primer (5-GTAAAACGACGGCCAGT-3') (SEQ ID NO.:4471) and rev primer (5- 
CAGGAAACAGCTATGAC-3') (SEQ ID NO.:4472) were used to generate internal 
sequence using ABI Prism BigDye Terminators (PE Applied Biosystems, Foster 
City, CA). 

The sequences, including ORFs (nucleic acid sequences within SEQ ID 
30 NOs 1-3334 ) and non-ORFs (SEQ ID NOs 3335-4464 ) are set forth in the 

Sequence Listing. The non-ORF regions may be particularly useful as diagnostic 
sequences. The ribosomal RNA genes may also be useful to distinguish between 
species. Also, intergenic regions generally may be useful diagnostics to establish 
genus and species of an unidentified microbe, as there may be less selective 
35 pressure to maintain fidelity of the sequences in these intergenic regions as 
compared to intragenic regions. 
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About 26 different isolates of S. epidermidis have been submitted to ATCC 
listed in their on-line catalog, listed below: 

1: ATCC Number 146 Organism: Staphylococcus epidermidis 
2: ATCC Number 33501 Organism: Staphylococcus epidermidis 
5 3: ATCC Number 49741 Organism: Staphylococcus epidermidis 

4: ATCC Number 51625 Organism: Staphylococcus epidermidis 
5: ATCC Number 29997 Organism: Staphylococcus epidermidis 
6: ATCC Number 19654 Organism: Staphylococcus epidermidis 
7: ATCC Number 14389 Organism: Staphylococcus sp. deposit 

io 8: ATCC Number 14852 Organism: Staphylococcus epidermidis 

9: ATCC Number: 49134 Organism: Staphylococcus epidermidis 
10: ATCC Number: 13518 Organism: Staphylococcus epidermidis 
11: ATCC Number 9491 Organism: Staphylococcus epidermidis 
12: ATCC Number 35547 Organism: Staphylococcus epidermidis 

15 13: ATCC Number 35984 Organism: Staphylococcus epidermidis 

14: ATCC Number: 35983 Organism: Staphylococcus epidermidis 
15: ATCC Number: 700296 Organism: Staphylococcus epidermidis 
16: ATCC Number: 49461 Organism: Staphylococcus epidermidis 
17: ATCC Number: 29641 Organism: Staphylococcus epidermidis 

20 18: ATCC Number: 29887 Organism: Staphylococcus epidermidis 

19: ATCC Number: 29886 Organism: Staphylococcus epidermidis 
20: ATCC Number: 55133 Organism: Staphylococcus epidermidis 
21: ATCC Number: 27626 Organism: Staphylococcus sp. deposit 
22: ATCC Number: 31874 Organism: Staphylococcus epidermid 

25 23: ATCC Number: 14990 Organism: Staphylococcus epidermid 

24: ATCC Number: 1 55 Organism: Staphylococcus sp. deposit 
25: ATCC Number: 155-U Organism: Staphylococcus sp.depos 
26: ATCC Number: 12228 Organism: Staphylococcus epidermid 

30 Throughout this application, various publications are referenced. These 

publications are hereby incorporated by reference in their entirety. 

While the invention has been described with respect to certain specific 
embodiments, it will be appreciated that many modifications and changes may be 
made by those skilled in the art without departing from the spirit of the invention. 

35 It is intended, therefore, by the appended claims, to cover all such modification 
and changes as fall within the true spirit and scope of the invention. 
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What is claimed is: 

1 . An isolated polynucleotide comprising a member selected from the 
group consisting of: 

(a) a polynucleotide encoding a polypeptide having at least a 70% 
identity to a polypeptide set forth in the Sequence Listing; 

(b) a polynucleotide which is complementary to the polynucleotide of (a); 

and 

(c) a polynucleotide comprising at least 15 sequential bases of the 
polynucleotide of (a) or (b). 

2. The polynucleotide of Claim 1 wherein the polypeptide has at least 
80% identity to the polypeptide set forth in the Sequence Listing. 

3. The polypeptide of Claim 2 wherein the polypeptide has at least 90% 
identity to the polypeptide set forth in the Sequence Listing. 

4. The polynucleotide of Claim 1 wherein the polynucleotide is DNA. 

5. The polynucleotide of Claim 1 wherein the polynucleotide is RNA. 

6. The polynucleotide of Claim 4 wherein the polynucleotide has at least 
80% identity to a polynucleotide set forth in the Sequence Listing. 

7. The polynucleotide of Claim 6 wherein the polynucleotide has at least 
90% identity to a polynucleotide set forth in the Sequence Listing. 

8. The polynucleotide of Claim 4 comprising a polynucleotide set forth in 
the Sequence Listing. 

9. The polynucleotide of Claim 4 comprising a polynucleotide set forth in 
the Sequence Listing as any of an odd-numbered SEQ ID Nos1-3334. 

10. An isolated polynucleotide comprising a member selected from the 
group consisting of: 

(a) a polynucleotide having at least a 70% identity to a 
polynucleotide contained in any of ATCC Deposit Nos. 146; 33501; 49741; 
51625; 29997; 19654 ; 14389 ; 14852 ; 9134 ; 13518; 9491; 35547 ; 35984; 
35983; 700296; 49461; 29641; 29887; 29886; 55133; 27626; 31874; 
14990; 155; 155-U; 12228 and substantially encoding the polypeptide 
comprising amino acids 1 to 416 of SEQ ID NO:2; 

(b) a polynucleotide complementary to the polynucleotide of (a); and 

(c) a polynucleotide comprising at least 15 sequential bases of the 
polynucleotide of (a) or (b). 
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11. A vector comprising the DNA of Claim 4. 

12. A host cell comprising the vector of Claim 1 1 . 

13. A process for producing a S. epidermidis polypeptide comprising 
expressing from the host cell of Claim 12 a polypeptide encoded by said DNA. 

14. A process for producing a cell which expresses a S. epidermidis 
polypeptide comprising transforming or transfecting the cell with the vector of Claim 
11 such that the cell expresses the polypeptide encoded by the DNA contained in 
the vector. 

15. A polypeptide comprising an amino acid sequence which is at least 
70% identical to a polypeptide set forth in the Sequence Listing. 

16. A polypeptide comprising an amino acid sequence which is at least 
80% identical to a polypeptide set forth in the Sequence Listing. 

17. A polypeptide comprising an amino acid sequence which is at least 
90% identical to a polypeptide set forth in the Sequence Listing. 

18. A polypeptide comprising an amino acid sequence as set forth in the 
Sequence Listing. 

19. An antibody against the polypeptide of claim 18. 

20. An antagonist which reduces or inhibits the activity of the polypeptide 
of claim 18. 

21. A method for the treatment of an individual having need to reduce or 
inhibit the activity of the polypeptide of Claim 18 comprising administering to the 
individual a therapeutically effective amount of the antagonist of Claim 20. 

22. A complex of a polypeptide and a binding molecule which comprises 
the polypeptide of Claim 18 and a binding molecule that is capable of antagonising 
the activity of the polypeptide. 

23. A process for diagnosing in a subject a disease related to expression 
of the polypeptide of claim 18 comprising detecting the presence in the subject of a 
nucleic acid sequence encoding said polypeptide. 

24. A diagnostic process comprising detecting the presence of the 
polypeptide of claim 18 in a sample derived from a subject. 

25. A method for identifying compounds capable of inhibiting the activity 
of the polypeptide of claim 1 8 comprising: 

(a) contacting a cell expressing the polypeptide on the surface thereof with a 
selected compound, under conditions to permit binding to the polypeptide in the 
presence of a component capable of providing a detectable signal in response to the 
binding of the compound to said polypeptide; and 
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(b) detecting the presence or absence of a signal generated in response to 
the binding of the compound to the polypeptide, 

the presence of a signal indicating a compound capable of inhibiting the activity of 
the polypeptide. 

26. A method for inducing an immunological response in a mammal which 
comprises inoculating the mammal with the polypeptide of Claim 15, or a fragment 
or variant thereof, adequate to protect said animal against infection from S. 
epidermidis. 

27. A method of inducing an immunological response in a mammal which 
comprises delivering a gene encoding the polypeptide of Claim 15, or a fragment 
or variant thereof, and obtaining expression of the gene in vivo in order to induce 
an immunological response to produce antibody to protect said animal against 
infection from S. epidermidis. 

28. An immunological composition comprising a DNA capable of 
expressing a polypeptide of Claim 15 which, when introduced into a mammal, 
induces an immunological response in the mammal, and a pharmaceutical^ 
acceptable carrier. 

29. An immunological composition comprising a polypeptide of claim 15 
which, when introduced into a mammal, induces an immunological response in 
the mammal, and a pharmaceutical^ acceptable carrier. 
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SEQUENCE LISTING 



<110> Kimmerle, Bill 

5 

<120> STAPHYLOCGOCCUS EPIDERMIS NUCELIC ACIDS 
Sequence 1 

Cont i g_04 4 0_pos_l 1 08 3_0 , 

10 putative peptide of unknown function ■ 

atgcaagctttacgagataaagtaggccaacaaaataacgttcaccaacaaagtaattat 
ttcaatgaagatgaacaaccaaaacataactatgataattctgtacaagccggtcaaact 
attattgataaacttcaagatccaatcatgaacaaaaatgaaattgagcaggctattaat 
caaatcaatacgactcaaacagcgttaagtggagaaaataaattacacactgaccaagaa 

15 agcacaaatagacaaatagaaggtttatctagtttgaacacagctcaaatcaacgccgaa 
aaagatttagtcaatcaagctaaaacaagaacagatgttgctcaaaagttagctacagct 
aaagaaataaattctgctatgagtaatttaagagatggcattcaaaataaagaggacatc 
aaacgtagcagtgcatatatcaacgcagatccgactaaagttacagcttacgatcaagca 
ctacagaacgcagaaaatatcatcaatgccacaccaaacgtagagcttaataaagctaca 

20 attgaacaagcgctatcacgcgttcaacaagcacaacaagatcttgatggtgttcaacaa 
ttagctaatgctaaacaacaagctacacaaactgtcaatgggttaaatagcttaaatgac 
ggtcaaaagcgtgaattaaatctattaattaattcagcLaatacccgtacaaaagtacaa 
gaagaattaaacaaagcaactgaatcgaaccatgcgatggaagctttaagaaacagtgtt 
caaaacgttgatcaagtaaaacaaagtaacaattatgtcaatgaagatcaacctgaacag 

25 cacaattatgataatgctgtcaatgaagctcaagctacaatcaacaacaatgctcaacct 
gttctagacaaattagctatagaacgtttaactcaaactgttaacactacaaaagatgca 
ttacatggtactcaaaaactgatacaagaccaacaagctgctgaaactggaatacgtggt 
ttaacgagtctcaatgaacctcagaaaaatgctgaagtagctaaagtaactgcagcaaca 
acacgtgatgaagtgagaaatattcgtcaagaagcaacaacat tagatactgcaatgct t 

30 ggtttacgtaaaagcattaaagataaaaacgatactaaaaatagtagtaaatatattaat 
gaggatcatgaccaacaacaagcttatgacaatgctgtaaataatg 

Sequence 2 

MQALRDKVGQQNNVHQQSNYFNEDEQPKHNYDNSVQAGQTIIDKLQDPIMNKNEIEQAIN 
35 QINTTQTALSGENKLHTDQESTNRQIEGLSSLNTAQINAEKDLVNQAKTRTDVAQKLATA 
KEINSAMSNLRDGIQNKEDIKRSSAYINADPTKVTAYDQALQNAENIINATPNVELNKAT 
IEQALSRVQQAQQ'DLDGVQQLANAKQQATQTVNGLNSLNDGQKRELNLLINSANTRTKVQ 
EELNKATESNHAMEALRNSVQNVDQVKQSNNYVNEDQPEQHNYDNAVNEAQATINNNAQP 
VLDKLAIERLTQTVNTTKDALHGTQKLIQDQQAAETGIRGLTSLNEPQKNAEVAKVTAAT 
40 TRDEVRNIRQEATTLDTAMLGLRKSIKDKNDTKNSSKYINEDHDQQQAYDNAVNNX 

Sequence 3 

C on t i g_0 4 4 0_po s_9 5 0 9_8 8 5 9, 

putative peptide of unknown function 

45 atgaatcgtattgcccatagttatggtttacatgatacatacagttttgtgacatcaact 
gcaattattttctcattaaatgatcgtactagtacgaggttgattcgtattcgcgaacgt 
acaaccgatcttgagaaaattgctttaaccaatagcctatctcgtaaaatttcgagtaag 
caacttacaattgacgaagcaaaaagtgagttactgcaacttaaacgtgcgtctcttcag 
tattctttcttaacaaatctcattgctgcctttgtagcttgtggttttttcttattcatg 

50 tttggtggcgtagcttccgacgcttggattgcatgcctagcgggtggcatagctttttta 
acgtttagtttcgtgcaaaaatatatacaaattaaattcttttcagagtttgtagcatct 
gctgttgttattagtattgcagcaatattcactaaactaggtatagctaaaaatcaagac 
at tattactattgcaagtgtcatgcctctcgttcccggtattttgattactaacgctatt 
cgtgacttacttgccggagagttacttgctggtatgtcacgtggtgttgaagctgcttta 

55 actgcatttgctattggtgcaggagtagctattgtattactattattataa 



Sequence 4 
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MNRIAHSYGLHDTYSFVTSTAII FSLNDRTSTRLIRIRERTTDLEKIALTNSLSRKISSK 
QLTIDEAKSELLQLKRASLQYSFLTNLIAAFVACGFFL FMFGGVASDAWI ACLAGGIAFL 
TFSFVQKYIQIKFFSEFVASAVVISIAAIFTKLGIAKNQDIITIASVMPLVPGILITNAI 
RDLLAGELLAGMSRGVEAALTAFAIGAGVAIVLLLL* 

5 

Sequence 5 

Con t ig_0 4 4 0_pos_8 8 4 3_8 34 9, 

putative peptide of unknown function 

atgtttatttatctgtttcactttatcattagtttcattgccacagtccttttttcaatt 
10 atatttaatgcacctaaaaaattgctattagcttgtggatttgttggagctgttgcttgg 
acaatatatcagatgacagtaggtatggatttaggtaaagttggcgcttcatttttagga 
agtctaatattaggattaatgagtcatacaatgagtagacggtacaagcaacctgt tatt 
atatttatcgtccccggcatta'tacctctcgttccaggtggcgcagcatatgaagctaca 
agatttttagtatcaaataattatacgaatgcagttaatacttttttagaggtaacatta 
15 r atttctggtgcaattgcattcggtatacttgtatctgaaatagtctat tacatttattca 
cgcatcaagcaatcttatggtaaaatcaagggtaaaacttataaaaaatcctataatatg 
aataatagagtataa 

Sequence 6 

20 M FI YL FH FI I S FI AT VLFS 1 1 FNAPKKLLLACG FVGAVAWTI YQMTVGMDLGKVGAS FLG 
SLILGLMSHTMSRRYKQPVII FIVPGII PLVPGGAAYEATRFLVSNNYTNAVNTFLEVTL 
ISGAIAFGILVSEIVYYIYSRIKQSYGKIKGKTYKKSYNMNNRV* 

Sequence 7 
25 Cont i g_0 4 4 0_pos_8 1 7 5_7 090, 

is similar to (with p-value 0.0e+00) 

>sp:sp|P5517 9|PEPT_BACSU PEPTIDASE T (EC 3.4.11.-) (AMINOTRI 
PEPTIDASE) (TRI PEPTIDASE) . >gp : gp | X99339 I BSGALE_6 B.subtilis 
orfs 1,2,3,4, pepT and galE genes. NID: gl429253. >gp:gp|Z9 
30 9123 | BSUB0020_187 Bacillus subtilis complete genome (section 
20 of 21): from 3798401 to 4010550. NID: g2636240. >gp:gp|D 
83026 | D83026_30 Bacillus subtilis genome sequence covering 1 
ic-cel region. NID: gl783231. 

atggatgaacatggttacttatttgctacactcgaaagcaatattaattataatgtacct 

35 actgtcggttttttagcacatgtagacacttcaccagatttcaatgcttctcatgtaaat 
ccgcaaatcattgaagcctataatgggcaacctatcaaacttggtgaatctcagcgtatc 
ttagatcctgatgtttttcctgaattaaataaagttgtgggtcatacactaatggtgaca 
gatggtacatctctactaggcgccgatgataaagcaggtgttgtagaaataatggaaggg 
ataaagtatttaattgatcatcctgacattaaacacggtacaattcgagttggctttaca 

40 cccgatgaagaaattggacgaggcccgcatcaatttgatgttagtcgatttaatgcagat 
tttgcatatacaatggatggcagtcaattaggagaactacaattcgaaagtttcaatgcg 
gcagaggtaactgtcacttgccatggtgttaacgttcatccaggttcagctaaaaatgcc 
atggttaatgcaattagtttaggtcaacagtttaatagtttacttccctcacatgaagtg 
cctgaaagaactgaaggatacgaagggttctatcatttaatgaattttacaggtaatgtt 

45 gaaaaagcaactctacaatatattattcgcgaccatgacaaagaacagtttgagctacgt 
aaaaaacgcatgatggaaattcgtgatgatattaatgttcattataatcattttccaatt 
aaagtagatgtgcatgaccaatattttaacatggcagaaaaaattgaacctttgaaacac 
atcattgatatacctaaacgtgtctttgaggctttagacatcgtacctaacactgaacct 
attcgaggtggtacagatggatcacaattatcttttatggggttacctacacctaatatt 

50 tt tact ggttgtggcaatttccacggtccttttgaatacgctt eta tcgatgtaatggaa 
aaggctgttcatg t tgtcgttggtattgctcaagaagtagcaaacagccatcaatcttat 
aaataa 



55 Sequence 8 

MDEHGYLFATLESNINYNVPTVGFLAHVDTSPDFNASHVNPQI IEAYNGQPIKLGESQRI 
LDPDVFPELNKVVGHTLMVTDGTSLLGADDKAGVVEIMEGIKYLIDHPDIKHGTIRVGFT 
PDEEIGRGPHQFDVSRFNADFAYTMDGSQLGELQFESFNAAEVTVTCHGVNVHPGSAKNA 
MVNAISLGQQFNSLLPSHEVPERTEGYEGFYHLMNFTGNVEKATLQYI IRDHDKEQFELR 



2 
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KKRMMEIRDDINVHYNHFPIKVDVHDQYFNMAEKIEPLKHIIDIPKRVFEALDIVPNTEP 
IRGGTDGSQLSFMGLPTPNIFTGCGNFHGPFEYASIDVMEKAVH VVVGIAQEVANSHQSY 
K* 

5 Sequence 9 

Contig_04 4 0_pos_4 334_3330, 

is similar to (with p-value 0.0e+00) 

>sp:sp|P37253|ILVC_BACSU KETOL-ACID REDUCTOI SOMERASE {EC 1.1 
.1.86) (ACETOHYDROXY-ACID I SOMEROREDUCTASE ) ( ALPHA-KETO-BETA 

10 -HYDROXYLACIL REDUCTOI SOMERASE) . >gp : gp I L03 18 1 1 BACILNB_3 Bac 
illus subtilis ilvB, ilvN and ilvC genes, complete ilv-leu o 
peron. NID: gl43090. >gp:gp| Z99118 I BSUB0015_94 Bacillus subt 
ilis complete genome {section 15 of 21) : from 2795131 to 301 
3540. NID: g2635200. >gp: gp | Z75208 | BSZ75208_74 B . subtilis ge 

15 nomic sequence 89009bp. NID: gl769994 . 

atgacaaaagtatattacgacgaaacagtaactcaggatgcattacaaggtaaaaaaatt 
gctgtcattggttatggctcacaaggacatgcacatgcacaaaatttaaaggacaatggt 
tatgatgtagtcattggcctgcgtccaggacgatcatttaataaagctaaagaagatgga 
tttgatgtttatacggtaagtgaagcaacacaacaagcagatgtagtgatggtactattg 

20 cctgatgaaattcaaggtgaagtatataacaaggaaattaaaccatatttagaaaaagga 
aatgctttagcattcgcacacggttttaatatccatttcagtgttatcgaaccacctagt 
gatgtcgatgtctttttagtagcacctaaaggacctggtcatt t agttagacgtacat tt 
gttgaaggaagtgccgtaccagcattatttggtgttcaacaagatgctacaggccaagct 
agaaacattgctttaagctacgcaaaaggcattggtgctactcgtgccggggtcattgaa 

25 acgacatttaaagaagaaactgaaacagatttattcggtgaacaagctgtactttgtgga 
ggagtttccaaattaattcagagtggattcgaaacacttgtggaagcaggttaccaacct 
gaattagcttattttgaagtcttacacgaaatgaaattaattgttgatttaatgtatgaa 
ggcggaatggaaaatgtccgttattctatctctaacactgctgaatttggcgactatgtt 
tctggaccaagagtaattacacctaatgttaaagaaaatatgaaaaaagtacttgaagat 

30 attcaaaatggtaactt tagccgtagattt gtt gaagat aacaaaaatggctttaaagaa 
ttctatcaattacgtgaagatcaacatggtcatcaaattgaacaagttggacgtgaatta 
agagaaatgatgccattcattaaatctaaaagtat tgaaaaataa 

Sequence 10 

35 MTKVYYDETVTQDALQGKKIAVIGYGSQGHAHAQNLKDNGYDVVIGLRPGRSFNKAKEDG 
FDVYTVSEATQQADVVMVLLPDEIQGEVYNKEIKPYLEKGNALAFAHGFNIHFSVIEPPS 
DVDVFLVAPKGPGHLVRRTFVEGSAVPALFGVQQDATGQARNIALSYAKGIGATRAGVIE 
TTFKEETETDLFGEQAVLCGGVSKLIQSGFETLVEAGYQPELAYFEVLHEMKLIVDLMYE 
GGMENVRYSISNTAEFGDYVSGPRVITPNVKENMKKVLEDIQNGNFSRRFVEDNKNGFKE 

40 FYQLREDQHGHQIEQVGRELREMMPFIKSKSIEK* 



Sequence 11 
45 Con t ig J) 4 4 0_pos_3 3 1 6_1 7 7 2 , 

is similar to (with p-value 0.0e+00) 

>gp:gp| U92974 I LLU92 974__14 Lactococcus lactis unknown gene, p 
artial cds, and HisC (hisC) , unknown, HisG (hisG) , unknown, 
HisB (hisB), unknown, HisH (hish) , HisA (hisA) , HisF (hisF) , 

50 HisIE (hisIE) , unknown, unknown, LeuA (leuA), LeuB (leuB), 
LeuC (leuC), LeuD (leuD), unknown, IlvD (ilvD) , IlvB (ilvB), 
IlvN, IlvC (ilvC), IlvA (ilvA), AldB (aldB) and aldR (aldR) 
genes, complete cds. NID: g2565137. 
gtgtttttaatggaagaacatattcaaatttttgatacaacacttagagatggtgaacaa 

55 acgccaggagtcaattttacttttgatgaaagattaaaaattgccaagcaactagaaaaa 
tggggagtagatgtactagaagcaggttttcctgcttctagtactggtagctttaaatca 
gtagaagctatagctaaaactttgactacaacagcagtgtgtggtttagctagatgtaaa 
aaatctgatattgatgctgtatatgaagccactaaagaagctgtaaaacctcaagtacat 
gtat teat tgcaacctcccctatt cat ttagaacataaat taaaaatgactcaagatgaa 
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gttttaacatcaataaaagaacacgtttcttatgcaaaacaattttttgaagtcgtacaa 
ttctctccagaggatgcaacaagaactgaaattccatttttaattgaatgtgttcaaact 
gcgattaacgcaggagccacaattatcaacatccctgatacagttggatttagttatcct 
acagagtatggcgaaatttttaaacaattaacacaggccgttaagtcaaattctaaaatt 
5 atctttagtgcacattgtcatgatgatctcggaatggcagttgctaatagtttagcagct 
attgaaggtggagctagacgtattgaaggtaccgtgaacggtattggtgaaagagcagga 
aatgcctcacttgaagaagtcgctttggctttatatgtaaggaaggaccactatggtctt 
gaatctcaaattaaccttgaagaaactaagaaaacatctgacttaatttcaagatatgct 
ggtatccgtgtacctagaaataaagctatagtcggtcaaaatgcatttagtcatgaatcc 

10 ggaattcaccaagacggtgtccttaaacatcgtgaaacctatgaaatcatgacacctcaa 
cttgtaggtgtgaatacaacagaattgccactaggtaaattgtctggtaaacatgcattt 
gccgaaaagcttaaagctctgggatatgaaattaaattggaagatcaagttacattattt 
aaacaatttaaagaaattgccgataagaaaaaaaatgtatccgatagagatat tcatgcg 
attatacatggctccgaacatgaacacaatgctatttttcaacttgataacttacaactt 

15 caatacgtatctaaaggtctacaaagtgcagtagtagttataaaggaaagaaacggacaa 
gttaaacaagattcaagtattggaacgggttcaattgttgcaatttataatgctgttgac 
cgaattttcaagaaagacgcagaattaattgattatcgtattgattctgtaacagaaggt 
actgatgctcaagcagaagtacatgtacgaatcattattaatcatattgaagtgacaggc 
ataggtatagaccacgatatattaaaagcttcatgtaaagcatatatcgatgctcatgct 

20 aaatatatttcagaatatgagttgaaagaaggtatacgtacatga 

Sequence 12 

VFLMEEHIQIFDTTLRDGEQTPGVNFTFDERLKIAKQLEKWGVDVLEAGFPASSTGSFKS 
VEAIAKTLTTTAVCGLARCKKSDIDAVYEATKEAVKPQVHVFIATSPIHLEHKLKMTQDE 

25 VLTSIKEHVSYAKQFFEVVQFSPEDATRTEI PFLIECVQTAINAGATIINIPDTVGFSYP 
TEYGEI FKQLTQAVKSNSKI IFSAHCHDDLGMAVANSLAAIEGGARRIEGTVNGIGERAG 
NASLEEVALALYVRKDHYGLESQINLEETKKTSDLISRYAGIRVPRNKAIVGQNAFSHES 
GIHQDGVLKHRETYEIMTPQLVGVNTTELPLGKLSGKHAFAEKLKALGYEIKLEDQVTLF 
KQFKEIADKKKNVSDRDIHAIIHGSEHEHNAIFQLDNLQLQYVSKGLQSAVVVIKERNGQ 

30 VKQDSSIGTGSIVAI YNAVDRIFKKDAELIDYRI DSVTEGTDAQAEVHVRI I INHIEVTG 
I G I DH D I LKAS CKA Y I DAH AK Y I S E YELKEG I RT * 



Sequence 13 
35 C on t i g_0 4 4 0_po s _1 2 8 0_7 3 2 , 

is similar to (with p~value 4.0e-48) 

>sp:sp |Q02143|LEU3_LACLA 3- ISO PROP YLMA LATE DEHYDROGENASE (EC 
1.1.1.85) (BETA-IPM DEHYDROGENASE) (IMDH) (3-IPM-DH) . >pir: 

pir | S35133 IS35133 probable 3-isopropylmalate dehydrogenase ( 
40 EC 1.1.1.85} - Lactococcus lactis subsp. lactis >gp:gp|U9297 

4 | LLU92974_15 Lactococcus lactis unknown gene, partial cds, 

and HisC (hisC) , unknown, HisG (hisG) , unknown, HisB (hisB) , 
unknown, HisH (hish) , HisA (hisA) , HisF (hisF) , HisIE (hisl 

E),. unknown, unknown, LeuA (leuA) , LeuB (leuB), LeuC (leuC), 
45 LeuD (leuD), unknown, IlvD (ilvD), IlvB (ilvB) , IlvN, IlvC 

(ilvC), IlvA (ilvA) , AldB (aldB) and aldR (aldR) genes, comp 

lete cds. NID: g2565137. 

gtgagaatcgcttttaatcttgcaaatcgtagacgtaaaaaattaacttctgttgataag 
gaaaacgttctatcctctagtaagttatggcgacaaatagtaaacgatgtaaaaaaggat 

50 tatccagaagtagaagttaatcatatgctagttgatgcttgtagcatgcatctgatcact 
ca acctacgcaatttgatgt gat tgtaacagagaatcttttcggagat at t ttaagtgat 
gaagcatctgttataccagggtctctaggtctttctccatcagctagttttggtcaaaca 
ggtacacgtctttatgaaccaattcatggttcagcaccagatatagctaatgaagataaa 
gcgaatccatttggtatggttctatctttagcactttgcttaagagaaagtttaaatcaa 

55 aatgatgctgctaacgaacttgagtcaattgtttactcgtttattcaatctaataagaca 
actgcagatttaggtggacaatatcgaacttcagaaatttttaaattgcttaaagaaaaa 
tatctataa 

Sequence 14 
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VRIAFNLANRRRKKLTSVDKENVLSSSKLWRQIVNDVKKDYPEVEVNHMLVDACSMHLIT 
QPTQFDVIVTENLFGDILSDEASVIPGSLGLSPSASFGQTGTRLYEPIHGSAPDIANEDK 
ANPFGMVLSLALCLRESLNQNDAANELESIVYSFIQSNKTTADLGGQYRTSEI FKLLKEK 
YL* 

5 

Sequence 15 

Cont ig_04 4 0 jpos_7 1 6_354 , 

is similar to (with p-value 1.0e-32) 

>pir:pirlD36889lD36889 3-isopropylmalate dehydratase (EC 4.2 
10 .1.33) chain leuC - Lactococcus lactis subsp. lactis (strain 
IL1403) 

atgggtcaaacactgtttgataaagtatggaaaaaacatgtgcttcatggaaaagaaggt 
gaaccacaattattatacattgatttacatctcattcatgaagtcacttctcctcaagcg 
tttgaaggacttagaatacaaaatcgtaaactcagaagacctgatctaacctttgcaact 
15 ttagatcataacgttcccacaattgatatttttaatataaaagatgaaattgctaataaa 
caaattacaactttacaacaaaatgctaaggactttggtgtacatatttttgatatgtta 
ctcataattgtcttaagtggacttaacgtatatcttatcattcaaacattccaagaatta 
tga 

20 Sequence 16 

MGQTLFDKVWKKHVLHGKEGEPQLLYIDLHLIHEVTSPQAFEGLRIQNRKLRRPDLTFAT 
LDHNVPTIDI FNIKDEI ANKQITTLQQNAKDFGVHI FDMLLI I VLSGLNVYLI IQTFQEL 

25 

Sequence 17 

Contig_04 41_pos_54 36_6512, 

is similar to (with p-value 0.0e+00) 

30 >sp:sp|P39576|ILVE_BACSU PUTATIVE BRANCHED- CHAIN AMINO ACID 
AMINOTRANSFERASE (EC 2.6.1.42) (BCAT) . >pir :pir | S57763 j S5776 
3 amino acid aminotransferase homolog - Bacillus subtilis >g 
p:gp|Z4 9992|BSCELABCD__6 B. subtilis celA, celB, celC, celD an 
d ywaA genes. NID: g895746. >gp : gp I Z 99 123 I BSUB0020_150 Bacil 

35 lus subtilis complete genome (section 20 of 21): from 379840 
1 to 4010550. NID: g2636240. 

atgtcagaaaaagtaaaattcgaaaaaagagagtctttaaaagaaaaacctgatacagca 
aacttaggatt tggacaatatttcacagactatatgttaagtgttgattatgacgctgat 
caaggatggcatgatatgaagattgtgccgtacgcaccatttgaaatt tcaccagcagcg 

40 caagggttacattatggtcaggcagtttttgaaggccttaaagcctataaacataatgga 
gaagttgtattattccgcccagatcaaaacttcaaacgtattaataattctttagcacgt 
ttagaaatgccagaagttgatgaagaagcattattagaagggttgaagcagcttatcgac 
gttgaacgagattgggtacctgaaggcgaaggtcaatcgttatatattcgtccttttgta 
tttgctactgaaggtgttttgggtgtacgttcttcacatcaatataaattactaattatt 

45 ttatctccgtcaggcgcttattatggtggtgacacattaaagtcaactaaaatttatgtc 
gaagatgaatatgtacgtgcagtacgtggaggtgtaggtttcgctaaagttgcaggtaac 
tatgctgccagcttactcgcacaaacaaacgctaataaattaggttatgaccaagtattg 
tggcttgatggtgttgaacaaaaatatgttgaagaagttggtagtatgaatat tttcttc 
gtagaaaatggaaaagtagttacgccagcattaaacggtagtatcttgcctggtatcact 

50 agaaagtcaattattcaattagctgaagatttaggttatgaagttgaagagagaagagtt 
tctatagaagagctgtttaacgcatatgataaaggtgaacttacagaagtatttggttca 
ggtacagcagctgttatctctcctgtaggtacacttcgctatgaagatagagaaattgtt 
attaataacaatgaacctggtaaaatcactcaaaaattatatgacacatatactggtatt 
caaagtggcaaattagaagataaatacggatggagagtagaagttcctaagtattaa 

55 

Sequence 18 

MSEKVKFEKRESLKEKPDTANLGFGQYFTDYMLSVDYDADQGWHDMKIVPYAPFEISPAA 
QGLHYGQAVFEGLKAYKHNGEVVLFRPDQNFKRINNSLARLEMPEVDEEALLEGLKQLID 
VERDWVPEGEGQSLYIRPFVFATEGVLGVRSSHQYKLLIILSPSGAYYGGDTLKSTKIYV 
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EDEYVRAVRGGVGFAKVAGNYAASLLAQTNANKLGYDQVLWLDGVEQKYVEEVGSMNIFF 
VENGKVVTPALNGSILPGITRKSIIQLAEDLGYEVEERRVSIEELFNAYDKGELTEVFGS 
GTAAVISPVGTLRYEDREIVINNNEPGKITQKLYDTYTGIQSGKLEDKYGWRVEVPKY* 

5 Sequence 19 

Cont ig_04 4 l_pos_678 2_7 4 98 , 

is similar to (with p-value 1.0e-33) 

>pir :pir (S60902 IS60902 hypothetical protein 1 - Haemophilus 
influenzae >gp : gp I X78559 | HISBCAL_1 H . influenzae DNA for sero 

10 type b capsulation locus. NID: g471233. 

atgatttatgcgggtatattagcaggtggtattggttctagaatgggaaatgttccatta 
cccaaacaatttttatcattacaaggaaaacctattattattcatacagtagaaaaattt 
ttaatgtataaggactttgatgaaatcatcattgccacgcctcaaaagtggatcaattat 
atgctcgatttgctaaacaattatcaattagacgataagaaaataaaagtaatacaaggc 

15 ggagacgaccgaaatcactctataatgaatattatagaaagcattgagcaacataaaaaa 
ttaaatgatgaagatataatcgttacccatgatgcagttaggccatttctaacaaatcga 
attattagagagaatgtggaatatgccagtcaatatggtgcagtagatacggttgttaat 
gctgttgatactatcatttcttcaaatgatgcacaatttatttctgggat tccaataaga 
agtgagatgtatcaaggacagacgcctcaaacttttaaaataaaagagttaaaggatagc 

20 tatttatcgttaactcaatctcaaaaggaaatattaactgacgcgtgtaaaatactcgta 
gaattgggtaagccagtaaaattagtcaaaggagagttatttaacataaaaataacaaca 
ccatatgatttaaaagttgcgaattcaattattactggagctgttgataatgattaa 



25 

Sequence 20 

MI YAG I LAGG IGSRMGNVPLPKQFLSLQGKPI 1 1 HTVEKFLMYKDFDEI 1 1 ATPQKWI NY 
MLDLLNNYQLDDKKIKVIQGGDDRNHSIMNIIESIEQHKKLNDEDIIVTHDAVRPFLTNR 
IIRENVEYASQYGAVDTVVNAVDTI ISSNDAQFISGIPIRSEMYQGQTPQTFKIKELKDS 
30 YLSLTQSQKEILTDACKILVELGKPVKLVKGELFNIKITTPYDLKVANSIITGAVDND* 

Sequence 21 

Contig_04 41_pos_7863_8522, 

putative peptide of unknown function 

35 atgaaacctgatagagtagtcactttaccacaagaaattgatttgagtgtagcgtcgtat 
actgaattagttacagttagtgttcatgcaatagatcgttttcaatcaaaagcaatacca 
caatttgaatcattggggatatggggagacggtaatttagggtatatcactgcagtt tta 
ctaaaaaaactatatcctactacaaaaataatagtttttggaaaaacattatataaatta 
agccgtttttcatttgtagatgaaattattcaaattgacaatattcctcaacatatcaaa 

40 attgatcatgcatttgaatgcgtgggtggtaaaggaagtcagcaggctattgaacaaatt 
attaatattattaatcctgaagggagtatcgctttattaggagtgagcgaactgcctata 
caagtgaatacaagaatggttttagaaaaaggtttaactataattggtagtagcagaagt 
ggtttaaaggattttgaaaaaactattgaattgtatcgtaaatatcctgaagttcttaat 
caattagcattacttaaaggtaaagaatttgaaataaataccatagaagatctcattaca 

45 gcgttcgaatatgatatttctaacgcatggggaaaaacagttttaaaatggaatatttaa 



Sequence 22 

MKPDRVVTLPQEIDLSVASYTELVTVSVHAIDRFQSKAIPQFESLGIWGDGNLGYITAVL 
50 LKKLYPTTKI I VFGKTLYKLSRFSFVDEIIQIDNIPQHIKIDHAFECVGGKGSQQAIEQI 
INIINPEGSIALLGVSELPIQVNTRMVLEKGLTIIGSSRSGLKDFEKTIELYRKYPEVLN 
QLALLKGKEFEINTIEDLITAFEYDISNAWGKTVLKWNI* 

Sequence 23 
55 Contig_0441_pos_8539_9987, 

is similar to (with p-value 5.0e-32) 

>pir:pir | S49240 I S4 9240 hypothetical protein 3 - Haemophilus 
influenzae >gp:gp| Z37516 I HIACAPIID_3 Haemophilus influenzae 
serotype a capsulation locus region II DNA. NID: g547510. 
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atgacaaagcaaaatatatttatagatgacatttattgggaacgtgtccaactcttcgtc 
aaaggacattttgaaggagtaaaacctacaagaaatttccttcttagaaatctaacagaa 
acaaaactattaaatgccaatcatgttaatattcaagggtcaacttttgaggcaagattt 
aatattgctattttagaaaaaggtaattttttaggtacaggcaattatatattaatcaac 
5 cgacaagaagatgaatatgtctgccaaattaaccccaaatttttgaatgataaaaaaaat 
cagatgactttagaggagttaagagattacaactcacttgagacccaatcgttacaaaaa 
agttatttattaaaaaagtatggtaaaagtttccaaagatataataacaaagagattaaa 
tcttacgtcattgttccggcaatatcccaagaaattaatgagtttatttttaaagttcaa 
tataaatctgaaataaagaaaataagtaaacttaagcaattatcatacatattacataaa 

10 gctttgaggaaaattagcttcaatgtgagagataaaatatatttgtcggtatttaacatt 
tccaaaacagtatataagaataataaaaatcatgttttgtttacatcagattctagagca 
aatatgtcaggaaattttaaatttatatacgaagaaatgcttaaacaacaattggacaaa 
aaacttgtcattcattctatttttaaacctaatatagcaaataggagatcgtttattgat 
aaattaaaatttccatattttttaggaaaatctaaatatatcttggttgatgattatcat 

15 ccgatgatatataaacttcaatttagagaaaaccaagaaatagttcaagtatggcatgct 
gtgggtgcttttaagactgtaggatttagtagaactgggaaaaaaggaggacctttcata 
gactctattggacataggaattatagtaaagcttatgtttcgtcaaataatgatattctt 
tactatgctgaagcttttggaattgaagaacatagggttattccaacaggtgttccacgt 
acggatgttttgttcgatgaatcttataaaacacgcattaaacaaagtttagaaacaaaa 

20 ttaccaattataaaaaataaaaaagtcattctttttgcacctacatttagaggaaatgga 
catcgcacagcacactatcctttctttaaaattaattttgcaagattagctagttat tgt 
gaagaacatcaagctactgttctgtttaaaatgcatccttttgttagaaataaattaaat 
atcccagcaatttatagtaaatattttttagatatttcaaattaccgcgaagtaaatgat 
gtattgttcattacggatattttaatctctgattattcttctttaatctatgaattttcc 

25 agtttttaa 

Sequence 24 

MTKQNI FIDDIYWERVQLFVKGHFEGVKPTRNFLLRNLTETKLLNANHVNIQGSTFEARF 
NIAILEKGNFLGTGNYILINRQEDEYVCQINPKFLNDKKNQMTLEELRDYNSLETQSLQK 

30 SYLLKKYGKSFQRYNNKEIKSYVIVPAISQEINEFIFKVQYKSEIKKISKLKQLSYILHK 
ALRKISFNVRDKIYLSVFNISKTVYKNNKNHVLFTSDSRANMSGNFKFI YEEMLKQQLDK 
KLVI HS I FKPNI ANRRS FI DKLKFPYFLGKSKY I LVDDYHPMI YKLQFRENQEI VQVWHA 
VGAFKTVGFSRTGKKGGPFIDSIGHRNYSKAYVSSNNDILYYAEAFGIEEHRVIPTGVPR 
TDVLFDESYKTRIKQSLETKLPI IKNKKVILFAPTFRGNGHRTAHYPFFKINFARLASYC 

35 EEHQATVLFKMHPFVRNKLNI PAI YSKYFLDISNYREVNDVLFITDILISDYSSLI YEFS 
SF* 

Sequence 25 

Contig_04 41_pos_1200_13, 

40 is similar to (with p-value 5.0e-90) 

>gp:gp|Y14083|BSY14083_l Bacillus subtilis chroraosomal DNA, 
region 76-78 degrees: between glyB-aprE. NID: g2226224 . 
atggaagaagtgatacatgtgtttgattggtttcaattagcaagtaataaagaaaagaga 
atggtgcaattacgacgatatttgcatcaatacccagaactttcttttgaggaaaaacgt 

45 acgcatgattt tat tgt aaa tea att gagecaa t t agca t gca cca tagaaa caeca gtt 
ggacgtaatggtataaaagcaacttttaaaggatctgattcaaatggaccaacgattgca 
ttacgagcagatttcgatgcactacctgttcaagaattaaatgatgtaccctatcgt tea 
aaaaataaagggtgcatgcatgcttgtggacatgacggacatacagctattttgcttgga 
gtagctgaaattgttcatgagcatcgtcatttattgaaaggtaatgttgtttttatattc 

50 caatatggtgaggaaattatgccaggtggttctcaagagatgattgatgatggctgtcta 
cagaatgtcgataaaatatatggcacacacttatggagtggttatccatctgggacaatc 
tattctagacctggagcaataatggcttcaccagatgaatttagtgtgactatatatgga 
aaaggtggtcacggtgcaaaaccacacgaaacaatagaccctattgtcattatggctgag 
tttattttaagtgcccaaaaaataatttctcgaacaattgatccagtaaaggaagctgtt 

55 cttactttcggaatgattcaagcaggatcaacagatagtgttattccagatacagct ttt 
tgtaaaggtactgtacgtacttttgacacaaaattacaaagtcatgttcaaaataaaatg 
gataagctcttacaaggtttatctttatcaaacgatattacatatgaattggaatatatt 
aaaggttatttaccagtacacaatcatcaacaatcatatgatgtagtcaaacaagcagct 
aatgatttacatttaagatttaatgagtcagacttaatgatgattggtgaggacttttca 
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cattaccttaaagtacgacctggtgcattcttcttaactggttgtggtaataaagacaaa 
ggcattactgcacctcatcataatcctcattttgacattgatgaatcttcattaaaatat 
gcagctagtgaatttttaaaaatattagaaattgaaaatgttttttaa 

5 Sequence 2 6 

MEEVIHVFDWFQLASNKEKRMVQLRRYLHQYPELSFEEKRTHDriVNQLSQLACTIETPV 
GRNGIKATFKGSDSNGPTIALRADFDALPVQELNDVPYRSKNKGCMHACGHDGHTAILLG 
VAEIVHEHRHLLKGNVVFIFQYGEEIMPGGSQEMIDDGCLQNVDKIYGTHLWSGYPSGTI 
YSRPGAIMAS PDEFSVTI YGKGGHGAKPHETIDPIVIMAEFILSAQKIISRTI DPVKEAV 
10 LTFGMIQAGSTDSVIPDTAFCKGTVRTFDTKLQSHVQNKMDKLLQGLSLSNDITYELEYI 
KGYLPVHNHQQSYDVVKQAANDLHLRFNESDLMMIGEDF5HYLKVRPGAFFLTGCGNKDK 
GITAPHHNPHFDIDESSLKYAASEFLKILEIENVF* 

Sequence 27 
15 ContigJ3442j?os_3158_4903, 

is similar to (with p-value 0.0e+00} 

>sp : sp | 00 6 4 4 6 | SECA_STAAU PRE PROTEIN TRANSLOCASE SECA SUBUNIT 
. >gp:gp|U97062 |SAU97062_1 Staphylococcus aureus NCTC 8325 S 
ecA (secA) gene, complete cds. NID: g2078389. 

20 atgggtggtattgctatacataaaggtgatattgcagaaatgagaacaggtgaagggaaa 
acattgactgcaaccatgccgacgtatttgaatgctttagctggtagaggtgtacatgt t 
attacagtcaatgaatatctatcaagttcacaaagtgaagaaatggctgaactatataac 
tatcttggcttaactgtaggtttgaacttaaatagtaagtcaactgaagaaaaacgtgag 
get tacgcacaagatatcacttatagtacgaataatgaacttggg tt tgatt atcttaga 

25 gataatatgg tgaactatgctgaagagagagtaatgcgtcctctacattttgcaattatt 
gatgaggtcgattccatattgatcgacgaagcaagaacacctttaattatttctggtgaa 
gcggaaaaatctacttctttatatactcaagcaaatgtttttgcaaaaatgct taaagcg 
gaagatgattataattatgatgaaaaaaccaaagctgtacatcttacagaacaaggtgca 
gataaagctgaacgtatgttcaaagtagataatctttatgatgttcaaaatgtggaagtg 

30 attagtcatattaatacagctttaagagctcatgttactttgcaacgcgatgttgat tac 
atggtcgttgacggtgaagtattaattgttgaccaatttactggacgtacaatgcctgga 
cgtcgtttt tctgaaggtttacaccaagcaattgaggctaaagaaggtgtagcaattcaa 
aatgagtctaaaacgatggcatccattactttccaaaactatttcagaatgtataataag 
ttagcggggatgactggtacagcgaaaaccgaagaggaagaatttcgtaatatctataat 

35 atgacagttacccaaattccaacaaacaaacctgttcaacgtaaagataattcagactta 
atttatattagtcaaaaaggaaagtttgatgcggtagttgaagatgttgtagaaaaacat 
aaaaaaggacaacccgtcttactaggtactgttgctgttgagacttctgaatatatttca 
aatttactaaaaaaacgtggtgtcagacatgacgtattaaacgctaaaaatcatgaacgc 
gaagctgaaatcgtttcaaacgcggggcaaaaaggtgcagttacaattgccacaaatatg 

40 gctggacgtggaacagatattaaacttggtgatggtgttgaagagttaggtggacttgct 
gttattggtactgagcgtcatgaatcaagacgtattgatgatcaattacgtggacgttca 
ggacgccaaggtgatagaggagatagtcgtttttacctatctttacaagatgaattaatg 
gtacgttttggttcagaacgcttacagaaaatgatgaaccgtttaggaatggatgattca 
acgccaatcgagtcgaaaatggtatctcgagctgtagaatcagctcaaaaacgagtagaa 

45 ggtaataactttgacgcgcgtaaacgtattctagaatacgatgaagttttacgtaagcaa 
cgtgaaattatttataatgagcgtaatgaaatcat tgatagtgaagaaagttctcaagtc 
gttaacgcgatgttacgttctacattgcaacgtgcgattaatcattttattaatgaagaa 
gacgataatcctgactacacgccatttatcaattacgttaatgatgtgttcttgctgaat 
tattaa 

50 

Sequence 28 

MGGIAIHKGDIAEMRTGEGKTLTATMPTYLNALAGRGVHVITVNEYLSSSQSEEMAELYN 
YLGLTVGLNLNSKSTEEKREAYAQDITYSTNNELGFDYLRDNMVNYAEERVMRPLHFAI I 
DEVDSILIDEARTPLIISGEAEKSTSLYTQANVFAKMLKAEDDYNYDEKTKAVHLTEQGA 
55 DKAERMFKVDNLYDVQNVEVISHINTALRAHVTLQRDVDYMVVDGEVLI VDQFTGRTMPG 
RRFSEGLHQAIEAKEGVAIQNESKTMASITFQNYFRMYNKLAGMTGTAKTEEEEFRNIYN 
MTVTQI PTNKPVQRKDNSDLIYISQKGKFDAWEDVVEKHKKGQPVLLGTVAVETSEYIS 
NLLKKRGVRHDVLNAKNHEREAEIVSNAGQKGAVTIATNMAGRGTDIKLGDGVEELGGLA 
VIGTERHESRRI DDQLRGRSGRQGDRGDSRFYLSLQDELMVRFGSERLQKMMNRLGMDDS 
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TPIESKbWSRAVESAQKRVEGNNFDARKRILEYDEVLRKQREIIYNERNEIIDSEESSQV 
VNAMLRSTLQRAINHFINEEDDNPDYTPFINYVNDVFLLNY* 

Sequence 2 9 
5 Con t i g_0 4 4 2_pos_60 4 8 J7 553, 

is similar to (with p-value 3.0e-31) 

>sp: sp| P134 84 |TAGE_BACSU PROBABLE POLY (GLYCEROL- PHOSPHATE) A 
LPHA-GLUCOS YLTRANS FERASE (EC 2.4.1.52) (TEICHOIC ACID BIOSYN 
THESIS PROTEIN E) . >pir : pir I S0604 8 I S0604 8 probable rodD prot 
10 ein - Bacillus subtilis >gp : gp I XI 5200 | BSRODC_l Bacillus subt 
ilis rodC operon. NID: g40098. >gp : gp I Z99122 I BSUB001 970 Bac 
illus subtilis complete genome (section 19 of 21) : from 3597 
091 to 3809700. NID: g2636029. 

atgatatattctatcggtaagaatttaggtaataaattaacaggtatagaacaagctatg 

15 atcaatagattaaagctatttaaagataatttagtcccaaataaactcatattcacatct 
tggtcaccacgtttatatatgcatgcacattcgttaaacatcgattcaaaagatattttc 
agtctttacgattttctacaagatagtattaactttgagaaaaaacatattgattggata 
aattattggcaaaatatatgtaattataccttaaaattcgttgaaaatacgaatgatatt 
aaaatatacgataacgacacatataaaatgtatgtgcattttgttgattcaaattatcaa 

20 actttagactatattaaccattttgatatacaacaacgtaaaattcgaagagatttttac 
gatacaagaggctttttaagttgtagtagaattttaacctctcaacaaaaagtcgtgatg 
gaacaattttttacacctacacaaaaagt taaatttcaaaaatattacaaccctgagcac 
gaacatcctacggtacaatctatcatttataatacttcacgagacgttcgttttttcaac 
gatgaaaatgaacttttagcgtttgcaattaatgcgctatatcatttaggagacgtattt 

25 ttatgtgataaaaacatcgttacagggccaatcattgatcaaactgatactaaaatacca 
gttctcgccgttttccacagtacccacgtaaaaaatatcaatgatatatatcattctgaa 
atcaaacaagcgtataaacctgttttagataacttatcccgatattcaggaatcatagta 
tctactgaacaacaaaaaacagatttatctgtaaaaattaataacgttattcccatttac 
gttatacctgcaggttatattgatacaaatgaatctcatcatagtagtgacaataaacca 

30 ttgcctaacaaaatgatttctatcggccgttattctcctgaaaagcaattagatcatcaa 
atagaactaatgtctaagctagttccagcatttccaaatttacagttacatttatttggg 
tttggtaaagaagaaacacattatcgtaaattaatagctcaatatcatttagaaaatcac 
gtgtttttacgtggcttcatttatgatttaaatcaagaaatagagaccgcctatttatct 
ttattgacaagtaaaatggaagggttcaatttaggtgtacttgaaacgattgctaaaggc 

35 gtacctacagtaagttatgataccaaatacggaccttctgagttaattgttaatcataaa 
aatggctttttaattgaacaagataataaagaacaactctatcacagcgttaaaaagtta 
ttactcgattctaacttaagagaacaattttctaaggaaagtattaaacatgcccaaata 
tttaatgacaaaaatgtttttgatacatggctcactgttttcagaacgttaaaagttaat 
ttataa 

40 

Sequence 30 

MI YSIGKNLGNKLTGIEQAMINRLKLFKDNLVPNKLI FTSWSPRLYMHAHSLNIDSKDIF 
SLYDFLQDSINFEKKHIDWINYWQNICNYTLKFVENTNDIKI YDNDTYKMYVHFVDSNYQ 
TLDYINHFDIQQRKIRRDFYDTRGFLSCSRILTSQQKVVMEQFFTPTQKVKFQKYYNPEH 

45 EHPTVQSI I YNTSRDVRFFNDENELLAFAINALYHLGDVFLCDKNIVTGPI I DQTDTKI P 
VLAVFHSTHVKNINDI YHSEIKQAYKPVLDNLSRYSGI IVSTEQQKTDLSVKINNVIPIY 
VIPAGYIDTNESHHSSDNKPLPNKMISIGRYSPEKQLDHQIELMSKLVPAFPNLQLHLFG 
FGKEETHYRKLIAQYHLENHVFLRGFIYDLNQEIETAYLSLLTSKMEGFNLGVLETIAKG 
VPTVS YDTKYGPSELIVNHKNGFLIEQDNKEQLYHSVKKLLLDSNLREQFSKESIKHAQI 

50 FNDKNVFDTWLTVFRTLKVNL* 

Sequence 31 

Contig_0442jpos_10138_10893, 
putative peptide of unknown function 
55 atgaatctgagtcactgtctgcgtccgaatcactatctgcgtcggagtcgctatctgaat 
cggaatcactatctgcgtctgagtcgctatctgaatcggaatcactg tcggagtctgagt 
cactgtctgaatccgagtcactgtctgaatcggaatccgaatcgctgtctgaatccgagt 
cactgtctgaatctgagtcactgtctgcgtccgaatcactatcagaatccgagtcgctgt 
ctgcgtccgaatcactgtctgaatctgagtcactgtctgcgtccgaatcactgtcagaat 
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ccgagtcgctgtctgcatccgaatcactatctgaatccgagtcgctgtctgcatccgaat 
cactatctgaatccgagtcgctatctgcatccgaatcactatctgcgtctgagtcactgt 
ctgcgtccgaatcactatctgcgtctgagtcactgtctgaatcggaatccgaatcactat 
cagaatccgagtcgctgtctgagtcagaatcactatctgaatctgagtcactgtctgcgt 
5 cagaatcactatctgcgtctgagtcactgtctgaatcggaa tccgaatcactatcagaat 
ccgagtcgctgtctgagtcagaatcactatctgaatctgagttactgtctgcgtcagaat 
cgctgtctgcgtccgagtcactgtcggagtctgaatcactatctgcgtctgagtcactgt 
ctgaatcatcgtcaaaataaccattatctatactaa 

10 Sequence 32 

MNLSHCLRPNHYLRRSRYLNRNH YLRLSRYLNRNHCRSLSHCLNPSHCLNRNPNRCLNPS 
HCLNLSHCLRPNHYQNPSRCLRPNHCLNLSHCLRPNHCQNPSRCLHPNHYLNPSRCLHPN 
HYLNPSRYLHPNHYLRLSHCLRPNHYLRLSHCLNRNPNHYQNPSRCLSQNHYLNLSHCLR 
QNHYLRLSHCLNRNPNHYQNPSRCLSQNHYLNLSYCLRQNRCLRPSHCRSLNHYLRLSHC 

15 LNHRQNNHYLY* 



Sequence 33 
20 Contig_04 4 2_pos_JL3870_134 30, 

is similar to (with p-value 4.0e-18) 

>gp: gp| AJ005645 |SAU5645_1 Staphylococcus aureus sdrC gene. N 
ID: g3550591. 

gtgacggataccaatgcgatggtagatagcttcaatcctgatttaaatagttctaatgta 
25 aaagatgtgacaagtcaatttacacctaaagtaagtgcagatggtactagagttgatatc 
aattttgctagaagtatggcaaatggtaaaaagtatattgtaactcaagcagtgagacca 
acgggaactggaaatgtttataccgaatattggttaacaagagatggtactaccaataca 
aatgatttttatcgtggaacgaagtctacaacggtgacttatctcaatggttcttcaaca 
gcacagggggataatcctacatatagtctaggtgactatgtatggttagataaaaataaa 
30 aacggtgttcaagatgatgatgagaaaggtttagcactgagagatcccctcataatttcc 
ccaaagcgtaaccatgtgtga 

Sequence 34 

VTDTNAMVDSFNPDLNSSNVKDVTSQFTPKVSADGTRVDINFARSMANGKKYI VTQAVRP 
35 TGTGNVYTEYWLTRDGTTNTNDFYRGTKSTTVTYLNGSSTAQGDNPTYSLGDYVWLDKNK 
NGVQDDDEKGLALRDPLIISPKRNHV* 

Sequence 35 

Cont ig_04 4 2_pos_l 08 8 1_93 97 , 

40 putative peptide of unknown function 

atggttattttgacgatgattcagacagtgactcagacgcagatagtgattcagactccg 
acagtgactcggacgcagacagcgattctgacgcagacagtaactcagattcagatagtg 
attctgactcagacagcgactcggattctgatagtgattcggattccgattcagacagtg 
actcagacgcagatagtgattctgacgcagacagtgactcagattcagatagtgattctg 

45 actcagacagcgactcggattctgatagtgattcggattccgattcagacagtgactcag 
acgcagatagtgattcggacgcagacagtgactcagacgcagatagtgattcggatgcag 
atagcgactcggattcagatagtgattcggatgcagacagcgactcggattcagatag tg 
attcggatgcagacagcgactcggattctgacagtgattcggacgcagacagtgactcag 
attcagacagtgattcggacgcagacagcgactcggattctgatagtgattcggacgcag 

50 acagtgactcagattcagacagtgactcggattcagacagcgattcggattccgattcag 
acagtgactcggattcagacagtgactcagactccgacagtgattccgattcagatagcg 
actcagacgcagatagtgattccgattcagatagcgactccgacgcagatagtgattcgg 
acgcagacagtgactcagattcatacagtgactcagattcagacagtgattcggacgcag 
acagtgactccgactccgacagcgattcagactcagatagtgactcagacgcagacagtg 

55 actcggactcagatagtgattcagatgcagaaagcgattcagactcagatagcgactccg 
attcagacagcgactccgactcagacagtgattccgattcagacagcgattcggactcag 
atagtgactcagacgcagatagtgattccgattcagatagcgactccgattctgatagtg 
actccgattcagatagcgactccgattcagatagtgattcggacgcagacagtgactcgt 
actcagatagtgactccgattcagacagtgattcggattccgatagcgattcggattccg 
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atagtgactcggattcagacagtgattcggactcagacagcgactccgattcagatagtg 
attccgactcagacagcgattcggattccgatagtgactcggattcagacagtgattcgg 
actcagacagcgattccgattccgatagtgactcggattcagacagtgattcgggctcag 
acagcgattccgattcagacagtgactcggactcagatagtgactccgattcagacagcg 
5 actcggattctgataaaaatgcaaaagataaattacctgatacaggagcaaatgaagatc 
atgattctaaaggcacattacttggaactttatttgcaggtttag 

Sequence 36 

MVILTMIQTVTQTQIVIQTPTVTRTQTAILTQTVTQIQIVILTQTATRILIVIRIPIQTV 
10 TQTQIVILTQTVTQIQIVILTQTATRILIVIRIPIQTVTQTQIVIRTQTVTQTQIVIRMQ 
. IATRIQI VIRMQTATRIQI VIRMQTATRI LTVI RTQTVTQIQTVI RTQTATRI LI VI RTQ 
TVTQIQTVTRIQTAIRIPIQTVTRIQTVTQTPTVIPIQIATQTQIVIPIQIATPTQIVIR 
TQTVTQIHTVTQIQTVIRTQTVTPTPTAIQTQIVTQTQTVTRTQIVIQMQKAIQTQIATP 
IQTATPTQTVIPIQTAIRTQIVTQTQIVIPIQIATPILIVTPIQIATPIQIVIRTQTVTR 
15 TQIVTPIQTVIRIPIAIRIPIVTRIQTVIRTQTATPIQIVIPTQTAIRIPIVTRIQTVIR 
TQTAIPIPIVTRIQTVIRAQTAIPIQTVTRTQIVTPIQTATRILIKMQKINYLIQEQMKI 
MILKAHYLELYLQV* 

Sequence 37 
20 Contig_04 4 2_pos_9263_7 629, 

putative peptide of unknown function 

atgagtatggaaaatcatatagaaagattgattaatcatgttgaaaaaacaatagaaa ta 
aaagaatatgcttttttaagccttggaaaatctaatataaaagccaaagttaaattatta 
aaaaagcctaattaccttagaagggatattactaaagaaattcaaaagtttagacagaaa 

25 acaggagcgtttccttcatgggtaaaaatagacattgttactgaagaagaagttacttta 
tttaaagatgttaaagatgaattaacgcaaactagaagaaattatattgattttggtata 
gctttagatcaatactggaatttatcatttttacctgaagaaataaacactaatgcattt 
attaaaccagtgaaaacagatgggaaaacgaagcttattctatctgaacaaaatataaat 
aactatttaagaaagtatacgaaccataagaaaaagtttgcttatgatttttatgaaaac 

30 aaagaagtcattaagttcaaaactaaaggttttatcttagacgaacaaaagatatatgaa 
ttacacgatgaaggttataaaaaaggattaagaaaggtcgattatttacataaagaaata 
gaccaattaattgaaagtggtacatatttcctaggaaatatgctatcagatactggaaga 
tatcaatatggttattttccacattttgataaagaaatcaatttctataatatattaaga 
catgcttcttcaacttatgcattaatagagggtttagattatttaggagaagatttaact 

35 atagtcgaaaaggcaattaactacgttattgagaattatttctatgataatgaaggtgtt 
ggatatatctttgatgatacaaaagatattaacgaaataaaattaggacaaaatgctgcc 
tttatatttgcggtttgtgaatatttaaagcataaccccaataagcaatacttatgcgtg 
gcacaaaaagttgctcgaggaattttatcaatgattaatcaagatacatacgaaacaact 
catatt ttaaattatccggatttaactgtgaaagaatcatttagaattatttattatgat 

40 ggtgaagcagctttagcattattacgcttatatcaccaagatcataatgataaatggttg 
gaagttgtgaagaagttaatggatcgatttattgaaaaagagtattggcaataccatgat 
cattggcttgggtattgcacgaacgagttagttcaattgtgtccgcaagataaatatttt 
gaatttgggattaagaatgtgaacacctatttggaatatattgaacaacgtgaaacaaca 
ttcccaacatttttagaaatgttaatggcaacctataagcttattcaaaaagcaaaagct 

45 acacatcgtcaagagcttgtgactcagttgattgatgaagaaaaattaatcaatgtgatt 
catacaagagcagaatatcaacgagtaggatttttctatcctgaaattgcaatgtacttc 
aaaaatccaaagcgaattcttggtagtttcttcataaaacaccatgggtatcgagttcgg 
attgatgatatcgaacattatatatcaggctatgttcaatatcaaaaggcacagattaaa 
gatgaaatattatag 

50 

Sequence 38 

MSMENHIERLI NHVEKTIEIKEYAFLSLGKSNIKAKVKLLKKPNYLRRDITKEIQKFRQK 
TGAFPSWVKIDI VTEEEVTLFKDVKDELTQTRRNYIDFGIALDQYWNLSFLPEEINTNAF 
IKPVKTDGKTKLILSEQNINNYLRKYTNHKKKFAYDFYENKEVIKFKTKGFILDEQKI YE 
55 LHDEGYKKGLRKVDYLHKEIDQLIESGTYFLGNMLSDTGRYQYGYFPHFDKEINFYNILR 
HASSTYALIEGLDYLGEDLTIVEKAINYVIENYFYDNEGVGYIFDDTKDINEIKLGQNAA 
FI FAVCEYLKHNPNKQYLCVAQKVARGILSMINQDTYETTHILNYPDLTVKESFRII YYD 
GEAALALLRLYHQDHNDKWLEVVKKLMDRFIEKEYWQYHDHWLGYCTNELVQLCPQDKYF 
EFGIKNVNTYLEYIEQRETTFPTFLEMLMATYKLIQKAKATHRQELVTQLIDEEKLINVI 
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HTRAEYQRVGFFYPEIAMYFKNPKRILGSFFIKHHGYRVRIDDIEHYISGYVQYQKAQIK 
DEIL* 

Sequence 39 
5 Contig_04 4 2_pos_5645_5175, 

putative peptide of unknown function 

atgtttagttatcaaataaataaaaatattaaattaaaaatattagaagaacgagaagcc 
gaacagttatttaaattagtagatagcaatcgtgactatttagctgaatttctgcctttt 
gttgaacatacgaagaaagttgaagatagtaaacactttatccattcggcgttgcaacaa 
10 tttatcgatggtaatggatttcattgtggaatatggagtaataaagaattgattggagtc 
ataggattgcattacttagatttagttaataaaacaacttcaattggttattatttagct 
gaagactttcaaaagaaaggtattatgactaaatgtactaaagcgttaattcgctatgta 
tatgaagtgtatgatattaatcgtatagaaatacgaatgtctactaaaaataagaaaagc 
agagctataccaattagacttgggttcacgcaaaggtatattgagaagtaa 

15 

Sequence 40 

MFSYQINKNIKLKILEEREAEQLFKLVDSNRDYLAEFLPFVEHTKKVEDSKHFIHSALQQ 
FIDGNGFHCGIWSNKELIGVIGLHYLDLVNKTTSIGYYLAEDFQKKGIMTKCTKALIRYV 
YEVYDINRIEIRMSTKNKKSRAI PIRLGFTQRYIEK* 

20 

Sequence 4 1 

Cont ig_04 4 2_pos_0_92 5 , 

is similar to (with p-value 3.0e-20) 

>pir : pir | S52351 I S52351 hypothetical protein 1 - Staphylococc 
25 us xylosus >gp : gp I X84 332 I SXGKG2_1 S.xylosus glucose kinase g 
ene. NID: g666114 . 

atgctaattaataatgaagataaaaggacttaccttcattacaaaagaaaagttttaaca 
caaaatcttgttgataaacatatgcaacgttttacccctattacatatacacttatctta 
attaatattgtgatatggttatgtatgattttatacttaaatcgattttctgatgttaaa 

30 ctattagaagtaggtggacttgttcattttaatgttgttcacggagaatggtatagactt 
atttcgtcaatgtttttacattttaatttcgaacacattttaatgaatatgctctctcta 
ttta tt t ttggtaaaat t gtcgaatcaatcattggatcatggcgaatgctaataat ttat 
ataatatccggattatatggaaattttgtttctctatcatttaatacgactacaatttca 
gtcggtgctagtggagcaatatttggtctaattggttctatttttgtgattatgtattta 

35 agcaagaattttaataaaaaaatgattggccagttattaattgctttggttgttttaatc 
gttttttcactttttatgtctaatattaatataatggcacatttaggtggat ttatcagt 
ggtgtattaattacattaataggctattatttcaaaacacaacgctctttattttggtca 
tttttgattgtatttttacttatattcatcattttacaaattagaatatttactataagt 
gaggataatatctatgataaattaattcgggatgaaatgattaaaggtaattatagcgaa 

40 gcaaaaaatgttgtaaaacaaacacttaataataattacgccgatgatgaaacatattac 
cttagtggtttgattactgcaactaagagttcgcaagcagaggccgtatcagaatgggaa 
agaggtttaagaaaatttccaaatt 

Sequence 4 2 

45 MLINNEDKRTYLHYKRKVLTQNLVDKHMQRFTPITYTLILINIVIWLCMILYLNRFSDVK 
LLEVGGLVHFNWHGEWYRLISSMFLHFNFEHILMNMLSLFIFGKIVESIIGSWRMLIIY 
IISGLYGNFVSLSFNTTTISVGASGAIFGLIGSIFVIMYLSKNFNKKMIGQLLIALWLI 
VFSLFMSNINIMAHLGGFISGVLITLIGYYFKTQRSLFWSFLIVFLLIFIILQIRI FTIS 
EDNIYDKLIRDEMIKGNYSEAKNVVKQTLNNNYADDETYYLSGLITATKSSQAEAVSEWE 

50 RGLRKFPNX 

Sequence 4 3 

Contig_04 4 3_pos_54 5_17 4 1 , 
putative peptide of unknown function 
55 atggttaaatttatacactgtgctgatttgcatttggacagtcctttcaaatctaaaagt 
tatcttagtccaaatatttttgaagatgtccaaaagagtgcatatgaaagttttaaaaac 
atagtcgact tagctttaaaacaggaagtcgattttattattatagcaggtgatttattt 
gatagtgagaatcgtacattgcgtgctgaagtctttttaaatgaacaatttgaaagatta 
agaaaagaacaaatatttgtttatatt tgccatggcaaccacgatcct cttacttctaaa 
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ataacaagtcagtggcctaataacgtatccgtattttcaaatcaagtagagacatatcaa 
gctatcactaaatcaggagaaacaatttatattcatggattcagctatcaaaatgatgcg 
agttatgaaaataaaatagacgcatacccatcaagtcaaggtcagaagggcatacatatt 
ggtgtattacatggaacttatagtaaatcttcggtgaaagaccgttatactgaatttagg 
5 ttagaagacttaaatcaacgtttataccactactgggcattaggacata tacaccaacgt 
gaacagttaagtgacatgccagtcattaactatccaggtaatatccaaggaagacatttc 
aatgaattaggagaaaaaggttgtctattggtcgaaggtgatcatcttaaactcactaca 
caattttatcctactcaatttattaaatttgaagaagctacaattgaaactgatcataca 
tctaaacaaggactttatgatgttattcaatcttttaaagataaagtaagaactgaaggg 

10 aaatcattttatagattgaacgtacgcattaatagtgaagacattattgcaccacaagat 
ttaattcaattaaaagaaatgattactgagttcgaagaaaacgaaaatcaatttgttttt 
attgaagatttaaatcttcaatatgttcaaaatgacgaaatgccaatagttaaagagttt 
tcaccagaattacttgatgatgcgtcactttttgattcggcaatgactgatttatatctt 
aatccaagggcttctaagtttttagatgactataatgaatttgataaagttgagttagtc 

15 aatcatgcagaaagacttttaaaggatgaaatgagaggtgaacaaaatgataattaa 

Sequence 4 4 

MVKFIHCADLHLDSPFKSKS YLSPNIFEDVQKSAYESFKNIVDLALKQEVDFI IIAGDLF 
DSENRTLRAEVFLNEQFERLRKEQI FVYICHGNHDPLTSKITSQWPNNVSVFSNQVETYQ 
20 AITKSGETIYIHGFSYQNDASYENKIDAYPSSQGQKGIHIGVLHGTYSKSSVKDRYTEFR 
LEDLNQRLYHYWALGHIHQREQLSDMPVINYPGNIQGRHFNELGEKGCLLVEGDHLKLTT 
QFYPTQFIKFEEATIETDHTSKQGLYDVIQSFKDKVRTEGKSFYRLNVRINSEDIIAPQD 
LIQLKEMITEFEENENQFVFIEDLNLQYVQNDEMPIVKEFSPELLDDASLFDSAMTDLYL 
NPRASKFLDDYNEFDKVELVNHAERLLKDEMRGEQNDN* 

25 

Sequence 4 5 

Contig_04 4 3 jpos_24 33_0 , 

is similar to (with p-value 1.0e-45) 

>sp:sp| P54 596I YHCL_BACSU HYPOTHETICAL 49.0 KD PROTEIN IN CSP 

30 B-GLPP INTERGENIC REGION . >gp : gp I X96983 I BS7 5DGREG_13 B.subti 
lis chromosomal DNA (region 75 degrees: cspB upstream of glp 
PFKD operon) . NID: gl239975. >gp : gp I Z99108 I BSUB0005_181 Baci 
llus subtilis complete genome (section 5 of 21} : from 802821 
to 1011250. NID: g2633055. 

35 atgcatgagcaaaaacaaaaagaggttgctctacacgatcaaacacaagaatggaaaagg 
ttagaacagtcgcttaatatagagcctataaattttcctgaaaaagggatagatagatac 
gaaactgctaaatctcacaaacaatcacttgaacgagataaaagtttgcgagaagaaaga 
ttaagcatattaaataaagaggcggagtccatcaatccagtagaccaaaagtatattgat 
tcgtttaatagcctttatcaacaagagactgaaattaaacaaaaagaatttgagttacgt 

40 tcaatagagaaagatattgctgataagcaacgtgaactagaagctcttcaatctataggt 
atcgtatttggcattgtattacacctcatatatggtgcagagtctaaaactctcgaacaa 
tcaacagactggtttagtattgttggagatggttatgttgcactattacaaatgattgtc 
atgccactaatattcatttcaattgttgccgcttttagcaaaatacaaattggtgaaaaa 
ttcgctaagatcggttcttatatttttatgtttttaattggtactgtagccattgcagct 

45 atcgttggaattttt tacgctttgatctttggtttagatgcatcgtctattgatttaggt 
agtgcagaacattcacgtggtacagaaatttcaaaacaagccaaagatttaactgcaaac 
actttaccacaacaaattctcgaagtat tcccaagcaatccat ttttagatttcacagga 
caacgtacaacttcgacaattgcagttgttatttttgcaacgtttgtgggctttgcttat 
cttagagttgcaagaaaacagccggaacatggaagcttactt 

50 

Sequence 4 6 

. MHEQKQKEVALH DQTQEWKRLEQSLNIEPI NFPEKGI DRYETAKSHKQSLERDKSLREER 
LSILNKEAESINPVDQKYIDSFNSLYQQETEIKQKEFELRSIEKDIADKQRELEALQSIG 
IVFGIVLHLIYGAESKTLEQSTDWFSIVGDGYVALLQMIVMPLIFISIVAAFSKIQIGEK 
55 FAKIGS YIFMFLIGTVAIAAIVGI FYALIFGLDASSIDLGSAEHSRGTEISKQAKDLTAN 
TLPQQILEVFPSNPFLDFTGQRTTSTIAVVIFAT FVGFAYLRVARKQPEHGSLL 

Sequence 47 

Contig_04 4 4_pos_4 472_4 089, 
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is similar to (with p-value 7.0e-18) 

>gp:gp|U40604 |LMU40604_2 Listeria monocytogenes ClpC ATPase 
(mec) gene, complete cds . NID: gl314293. 

gtgaaacagatacttcaacaccttgctgcaaaacatggtattaattttcatgagatggca 
5 tttaaagaagaaaaaaaatgcccaacgtgtcagatgacacttaaggatattgcacatgtt 
ggtaagcttgggtgtgctgattgttatgctacgt ttaaagaagacatcattgatatagtt 
caacgtgttcaaggtggtcaatttgaacatgtaggaaaaacaccacaatcatcgtataag 
aaacttgcaataaaaaagcaaattgaagaaaaatcaaaatatctaaataaattgatagat 
ggtcaagagtttgaagaggcagcgattgttcgtgatgaaattaaagctttaaaaagtgag 
10 agcgaggtgtctcatgatgagtaa 

Sequence 4 8 

VKQILQHLAAKHGINFHEMAFKEEKKCPTCQMTLKDIAHVGKLGCADCYATFKEDIIDIV 
QRVQGGQFEHVGKTPQSSYKKLAIKKQIEEKSKYLNKLIDGQEFEEAAIVRDEIKALKSE 
15 SEVSHDE* 

Sequence 4 9 

Contig_04 4 4_pos_307 8_625, 

is similar to (with p-value 0.0e+00) 

20 >sp:sp|P37571|MECB_BACSU NEGATIVE REGULATOR OF GENETIC COMPE 
TENCE MECB. >gp : gp I D26185 | BAC180K14 8 B. subtilis DNA, 180 k 
ilobase region of replication origin. NID: g467326. >gp:gp|U 
02604 | BSU02604_2 Bacillus subtilis Marburg 168 ClpC adenosin 
e triphosphatase (mecB) gene, complete cds, orfX and orfY, p 

25 artial cds. NID: g442358. >gp : gp I Z99104 j BSUB0001_86 Bacillus 
subtilis complete genome (section 1 of 21) : from 1 to 21308 
0. NID: g2632267. 

atgttatttggtagattgacagagcgtgcacaacgtgtgttggcacatgcacaagaggaa 
gcaattcgtt tgaaccatt ctaat attggaacagaacatcttttgcttggttt aatgaaa 

30 gagccagaaggtatagcagcaaaggtattagtaagttttaatattactgaagataaagtc 
atcgaagaagttgaaaaacttatcggtcacggtcaagagcaaatgggcacactacat tat 
acaccgagagcaaaaaaagtaattgaactgtctatggatgaagctcgaaagctacatcat 
aactttgtaggaacagagcatatactattaggtttaattagagaaaatgaaggtgttgca 
gcacgtgtatttgcaaacctagatttaaatattactaaagcacgtgcccaagttgtaaaa 

35 gctttaggaagtccagaaatgagtaataaaaatgcgcaagctaataagtctaataacacg 
cctactttagatggattagctagagatttaactgttattgctaaagatggaacgttagat 
ccagtcgtaggacgagataaagaaattactcgtgtaattgaagttttaagtcgtcgtact 
aaaaataatcctgtgctaat tggtgaacccggtgt tggtaaaacagcaattgctgaaggg 
cttgcgcaagcaattgttaaaaatgaagtaccagaaactttaaaagacaaacgtgtaatg 

40 tcattagatatgggtacagtcgtagctggcactaaatatcgtggtgaatttgaagaaaga 
ttgaaaaaagttatggaggaaatccatcaagctggtaatgttattctatttatcgatgaa 
cttcatactttagttggcgctggtggcgcagaaggagcaattgatgcatctaatatttta 
aaacctgctttagctcgtggagaattgcaatgtataggtgccacaacattagatgaatat 
cgtaaaaatatagaaaaagacgctgcattagaacgtcgttttcaaccaattcaagtggat 

45 gaacctacagttgaagacacgattgaaatcttaaaaggattacgtgaccgttatgaggct 
catcacagaattaatatctcagatgaagctttagaagcggctgctaaattgagtgatcgc 
tatgtttcagatcgtttcttgccagataaagccattgacttaattgatgaggcaagttca 
aaagttagacttaaaagtcatacaacgccaagtaatttaaaagagattgaacaagaaatt 
gataaagtaaaaaatgaaaaagatgctgcagttcatgctcaagaatttgaaaatgccgct 

50 aatttaagagataagcaatctaaacttgaaaagcaatatgaagatgctaaaaatgaatgg 
aaaaatgcacaaggtggtttagatactgccttatctgaagaaaatatcgctgaagtaata 
gctggttggacaggtattcctttaactaaaattaatgaaactgaatcagatcgtttattg 
aatcttgaagatacacttcataaacgtgtcattggacaaaacgatgctgtcaattcaatt 
agtaaagctgt tagaagagctcgtgctggtcttaaagatccaaaacgtccaatcggtagt 

55 tttattttcttaggacctacaggtgtgggtaaaactgaattggctcgtgctttagctgaa 
tctatgtttggtgaagacgatgcaatgattcgcgtagatatgagtgaatttatggagaaa 
catgctgtcagtcgattagttggtgcacctccaggatatgtaggacatgatgacggcggt 
caattgactgaaaaagttagacgtaaaccatactctgtgattttatttgatgaaattgag 
aaagcacatcctgacgtatttaatattcttctacaagttttagatgatggtcatttaaca 
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gatactaaaggtcgtactgtggacttccgtaatactgtgattattatgacttctaatgtg 
ggagctcaagaattacaggaccaacgctttgctggttttggaggtgcttcagaaggtagt 
gactacgaaactgtcagaaaaacaatgatgaaagaattaaaaaattcattccgaccagaa 
ttcttaaaccgtgttgatgacattattgtcttccacaaacttacaaaagatgaattaaaa 
5 gaaattgttacaatgatggtaaataaacttactcaccgtctttcagagcaaaatattaat 
attgttgttactgataaagcgaaagaaaaaattgcagaagaaggatatgatcctgaatat 
ggtgctagaccactcattagagcaattcaaaaaacggttgaagataatttaagcgaattg 
attttagatggaaataaaattgaaggtaaagaagtaacaattgatcatgatggtaaagaa 
tttaagtatgatatttatgaaattacagctaaaaaagaaacaacagaatcataa 

10 

Sequence 50 

MLFGRLTERAQRVLAHAQEEAIRLNHSNIGTEHLLLGLMKEPEGIAAKVLVSFNITEDKV 
IEEVEKLIGHGQEQMGTLHYTPRAKKVIELSMDEARKLHHNFVGTEHILLGLIRENEGVA 
ARVFANLDLNITKARAQVVKALGSPEMSNKNAQANKSNNTPTLDGLARDLTVIAKDGTLD 

15 PVVGRDKEITRVIEVLSRRTKNNPVLIGEPGVGKTAIAEGLAQAIVKNEVPETLKDKRVM 
SLDMGTVVAGTKYRGEFEERLKKVMEEIHQAGNVILFIDELHTLVGAGGAEGAIDASNIL 
KPALARGELQCIGATTLDEYRKNIEKDAALERRFQPIQVDEPTVEDTIEILKGLRDRYEA 
HHRINISDEALEAAAKLSDRYVSDRFLPDKAIDLIDEASSKVRLKSHTTPSNLKEIEQEI 
DKVKNEKDAAVHAQEFENAANLRDKQSKLEKQYEDAKNEWKNAQGGLDTALSEENIAEVI 

20 AGWTGIPLTKINETESDRLLNLEDTLHKRVIGQNDAVNSISKAVRRARAGLKDPKRPIGS 
FI FLGPTGVGKTELARALAESMFGEDDAMI RVDMSEFMEKHAVSRLVGAPPGYVGHDDGG 
QLTEKVRRKPYSVILFDEIEKAHPDVFNILLQVLDDGHLTDTKGRTVDFRNTVIIMTSNV 
GAQELQDQRFAGFGGASEGSDYETVRKTMMKELKNSFRPEFLNRVDDI IVFHKLTKDELK 
EIVTMMVNKLTHRLSEQNINIVVTDKAKEKIAEEGYDPEYGARPLIRAIQKTVEDNLSEL 

25 ILDGNKIEGKEVTIDHDGKEFKYDI YEITAKKETTES* 

Sequence 51 

Contig_04 4 5_pos_1513_1908, 

is similar to (with p-value 1.0e-34) 

30 >gp:gp| AFO09352 | AF009352_4 Bacillus subtilis osmoprotectant 
transport system OpuC including ATPase (opuCA) , transmembran 
e protein (opuCB) , osmoprotectant binding protein precursor 
(opuCC) and transmembrane protein (opuCD) genes, complete cd 
s. NID: g2271388. 

35 atgattgaatgtaacaaattacctttcattacttatgaccacctttcttttcttcaaagt 
aatgatgtttctttaaatattcttcagctatcactgcaggttccttaccttttccatccg 
cttcataatttaacttctgcatttcttctgttgagattttaccttctaatttttttagtg 
cct tatcga tttctggattatcctt tat taattgt teat ttgcaagtggactaccgt cat 
aaggcgggaagaatttgcgatcatcttccaatattttcaaatcataagctgcaatacgtc 

40 catctgttgaatacccaactgctacatctaatttattattttttaatgcatcatatacta 
aaccaatttgcattggacgtgcactatcaaatttaa 

Sequence 52 

MIECNKLPFITYDHLSFLQSNDVSLNILQLSLQVPYLFHPLHNLTSAFLLLRFYLLIFLV 
45 PYRFLDYPLLI VHLQVDYRHKAGRICDHLPIFSNHKLQYVHLLNTQLLHLIYYFLMHHIL 
NQFALDVHYQI * 

Sequence 53 

Contig_044 5_pos_8150_8581, 

50 putative peptide of unknown function 

atgttcaaaaatatattattaccctatgatttcgaaaatgattttagtgctatccctgac 
tatttagaaaaagtcaccgatgaagattcagttgttgtaatt tatcacgttgtaacagaa 
aatgatct tgcaattagtgtcaagtattataataagcataaagaagatattattagagaa 
aaagagaaaaaactcactccatttttacgtgaattagaaaaaagagatattcaatataaa 

55 atagatgtagat tt tgggca tat taaagatacaatcttagaaaaaattact tctggagat 
ataaataatggtgaatttgatttagtaattatgagtaatcatagagtcgatttgaatatt 
aaacatgttttaggagatgttacacataagattgctaaaagaagttctgtcccagtacta 
attgttaaataa 
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Sequence 54 

MFKNILLPYDFENDFSAIPDYLEKVTDEDSVWIYHVVTENDLAISVKYYNKHKEDIIRE 
KEKKLTPFLRELEKRDIQYKIDVDFGHIKDTILEKITSGDINNGEFDLVIMSNHRVDLNI 
KHVLGDVTHKIAKRSSVPVLIVK+ 



Sequence 55 

Con t i g_0 4 4 5_po s_7 4 8 6J7 115, 

10 putative peptide of unknown function 

gtggattacatggcaagacattttggagtttactatagcttgacaactatttctcgtgac 
ttacaagaattagaaatttacaaaatccctgttgaaaataaaaagtatatttacaagaaa 
ataaatcaaacaaatcaattaagtgcaaaaaaacaattagaaatatttagtgatgagat t 
attgaatttataacgctaaataactatgtcttaataaaaacatctcctggctttgctcaa 

15 agtataagttattacatagatcaattacaaatgaaagaaatattaggaattattggaggt 
aacgatactttgatgattttgacttcttcaaatgaaatagcagaatttgtttgttatcaa 
ttattcccttaa 

Sequence 56 

20 VDYMARHFGVYYSLTTISRDLQELEIYKIPVENKKYIYKKINQTNQLSAKKQLEIFSDEI 
IEFITLNNYVLIKTSPGFAQSISYYIDQLQMKEILGIIGGNDTLMILTSSNEIAEFVCYQ 
LFP* 

Sequence 57 
25 Contig_04 4 5_pos_67 95^561 1 , 

is similar to (with p-value 0.0e+00) 

>gp:gpl Y17554 | BLY17554_1 Bacillus lichenif ormis arcA, arcB, 
arcC and arcD genes. NID: g3687415. 

gtgttgttaaaaagaccaggaaaagaattagaaaatttagtacctgatcatttaagtggt 

30 ttattattcgatgatattccctacttaaaagttgcacaagaagagcatgacaaatttgct 
caaactttgagagatgaaggaatcgaagtagtttatttagaaaaacttgcagcagaatct 
attactgagccagaagtacgcgagaacttcataaacgacatattaacagaatctaaaaag 
acaatattaggtcatgaaactgaaattaaagaattcttttcaaagttatctgaccaagaa 
cttgtaaataaaatcatggctggcgtacgtaaagaagaaattcaacttgaaacaacccat 

35 ttagtagaatatatggatgatagatatccattttacttagatccaatgcccaacctttat 
tttacaagagatccccaagcttcaattggtagaggaatgacaattaacagaatgtattgg 
agagcacgacgtagagaatctatttttatgacatatatactgaaacatcatccaagattt 
aaagataaagatgtaccagtatggttagatcgtaactcaccatttaatattgaaggtgga 
gatgaattagtattatcgaaagatgttttagctattggtatatcagaacgtacatcagct 

40 caagcaatagaaaagttagcacgtaatattttcaaagatgcaaacacaagttttaaaaaa 
atcgtagctattgaaatacctaatacacgtacatttatgcacctagatacagtactaact 
atgattgactacgataagtttacagtacatgcagcaatatttaaagaagaaaataatatg 
aa tat at ttaccatagaacaaaat gat ggtaaggacgat a taaaaattactcgttct age 
aagttacgtgaaacacttgctgaagttttagaagtagaaaaagtggactttattccaaca 

45 ggtaatggcgacgt tattgatggtgcacgtgaacaatggaatga tggctcaaacacatta 
tgtattcgaccaggggttgtggtgacatacgatcgcaactatgtatcaaaccaactttta 
cgcgacaaaggaattaaagtgattgaaattactggtagtgaacttgtacgtggacgcgga 
ggcccaagatgtatgagtcagccgttatttagagaagatatttaa 

50 Sequence 58 

VLLKRPGKELENLVPDHLSGLLFDDIPYLKVAQEEHDKFAQTLRDEGIEVVYLEKLAAES 
ITEPEVRENFINDILTESKKTILGHETEIKEFFSKLSDQELVNKIMAGVRKEEIQLETTH 
LVEYMDDRYPFYLDPMPNLYFTRDPQASIGRGMTINRMYWRARRRESI FMTYILKHHPRF 
KDKDVPVWLDRNSPFNIEGGDELVLSKDVLAIGISERTSAQAIEKLARNIFKDANTSFKK 

55 IVAIEI PNTRT FMHLDTVLTMIDYDKFTVHAAIFKEENNMNIFTIEQNDGKDDIKITRSS 
KLRETLAEVLEVEKVDFIPTGNGDVIDGAREQWNDGSNTLCIRPGWVTYDRNYVSNQLL 
RDKGIKVIEITGSELVRGRGGPRCMSQPLFREDI * 



16 



WO 01/34809 



PCTAUS00/30782 



Sequence 59 

Con t ig_0 4 4 5_pos_5 52 5_4 1 0 4 , 
is similar to (with p-value 0.0e+00) 
5 >gp:gpl Y17554 |BLY17554_3 Bacillus lichenif ormis arcA, arcB, 
arcC and arcD genes. NID: g3687415. 

atggatgaaaataaattaggtaaaacttccttaattggtttagtcataggctctatgata 
ggcggtggtgcattcaatatcatctcagatatgggtggccaagctggtggacttgcaata 
attatcggttggataataactgctattggtatgatttctcttgctttcgtatttcaaaat 

10 ttaacaaatgagcgaccagatcttgatggaggaatttatagttatgctcaaacagggttt 
ggagattttattggtttttcaagtgcttggggatattggtttgcagcatttctaggtaat 
gtggcttatgcaaccctattaatgtcagctgtgggtaactttt tccctatatt taaagga 
ggtaacacacttccaagtattatcatagcatcaattttattatggggtgtacatttttta 
atacttagaggtgtagaaactgcagcgtttataaatagtattgttacagtagctaaatta 

15 atacctatatttctagttattatatgcatgatagttgtattcaacttcagtacttttaaa 
tccggtttttatggtatgactagtggaagtgttggcgtttttagttggggagatacaatg 
gcacaagtaaaaagtactatgttagtaactgtatgggtattcacagggattgaaggagcc 
gttgtcttttctggacgtgcaaagtctaaaaaggatgtaggaactgctaccgttattggt 
ttgatttctgtgctagtcatttatttcttaatgactgtactagcccaaggtgtcattcag 

20 cagaaccaaatttcaaaacttgctaatccatcaatggcacaagtattagaacatattgta 
ggtcattggggttcagtgttagttaatataggcttaattatctctgttttaggagcttgg 
ttaggatggacattactagctggtgaattaccattcattgtagctaaagatggacttttc 
ccgaaatggtttgctaaagaaaataagaataaagctccggtcaacgctttaat tat tact 
aatatattagttcagttatttttaattagtatgttgtttacagatagtgcctatcagttt 

25 gcgttttcacttgcatcaagtgcaatcttaattccatatacactcagtgctttttaccag 
gttaaatatactattcaaaataaatctaaagctaatttaaaacaatggataataggaatt 
attgcatctatttacacaatttggttggtttatgcagctggattagattatttactatta 
acgatgttgttatatatacctggattactcgtatacagctacgtacaaagggataalaac 
aaacatttgacaaaattggattatacgttattcatattcatcattgtacttgcaataata 

30 ggaatagttcgtttgattacaggtaatatttctgtattttaa 

Sequence 60 

MDENKLGKTSLIGLVIGSMIGGGAFNI ISDMGGQAGGLAIIIGWIITAIGMISLAFVFQN 
LTNERPDLDGGI YSYAQTGFGDFIGFSSAWGYWFAAFLGNVAYATLLMSAVGNFFPI FKG 

35 GNTLPSIIIASILLWGVHFLILRGVETAAFINSI VTVAKLIPIFLVIICMIVVFNFSTFK 
SGFYGMTSGSVGVFSWGDTMAQVKSTMLVTVWVFTGIEGAVVFSGRAKSKKDVGTATVIG 
LISVLVI YFLMTVLAQGVIQQNQISKLANPSMAQVLEHIVGHWGSVLVNIGLI ISVLGAW 
LGWTLLAGELPFIVAKDGLFPKWFAKENKNKAPVNALIITNILVQLFLISMLFTDSAYQF 
AFSLASSAILI PYTLSAFYQVKYTIQNKSKANLKQWIIGI IASIYTIWLVYAAGLDYLLL 

40 TMLLYI PGLLVYSYVQRDNNKHLTKLDYTLFIFI IVLAI IGIVRLITGNISVF* 

Sequence 61 

Contig_04 4 5_pos_4 062_3373, 

putative peptide of unknown function 

45 atgtatgaagaaaatatttatattaaaaattcagaatatgaatttgataataatcttaaa 
caattagcatcatacttaaatattcctgttagtattgttagaccttataaagaggattta 
acactttatcaatataaaaaaggacaagtcatatatcattcaactgatcaaataaaattt 
gtatactttttagtaaatggttgtattttacatgaatcttctaatattactggtgacaat 
tatttaagattaagtaaagacgaaaatatatttccaatgaacttcatatttaatgaaacc 

50 cctgcaccatatgaaatatgtacagctttgacagattgtaaaatattaactttaccgaaa 
gatttacttgagtatttatgtagaaagcataatgaaatatttgaaagtctct tcaagaaa 
cttaatgagactattcaatttcaagtagaatatattatggcgttaagagctaattcagct 
aaagaaagaattgaaagaatactacaaattttatgcct ttcaattgggga tgataatgga 
gaattctatgaattaaaacaaattatgactgttcaattaataagtaatttatctggactt 

55 aacagaaaaactactggtgaaataatcagagaattaaaaatagaaaatattatatatcaa 
gataaaagaaattggattataaaaaaataa 
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Sequence 62 

MYEENIYIKNSEYEFDNNLKQLASYLNIPVSIVRPYKEDLTLYQYKKGQVIYHSTDQIKF 
VYFLVNGCILHESSNITGDNYLRLSKDENIFPMNFIFNETPAPYEICTALTDCKILTLPK 
DLLEYLCRKHNEI FESLFKKLNETIQFQVEYIMALRANSAKERIERILQILCLSIGDDNG 
5 EFYELKQIMTVQLISNLSGLNRKTTGEIIRELKIENII YQDKRbfWIIKK* 

Sequence 63 

Contig_04 4 5_pos_3371_2502, 

is similar to (with p-value 7.0e-53) 

10 >gp:gp | AF009352 I AF009352_3 Bacillus subtilis osmoprotectant 
transport system OpuC including ATPase (opuCA) , transmembran 
e protein (opuCB) , osmoprotectant binding protein precursor 
(opuCC) and transmembrane protein (opuCD) genes, complete cd 
s. NID: g2271388. >gp : gp | Z99121 | BSUB0018_68 Bacillus subtili 

15 s complete genome (section 18 of 21) : from 3399551 to 360906 
0. NID: g2635827. 

atgaataaaatattaattgaaaaggagatattcaaaatgaaaaatttaagaaatagaaac 
tttttaactttgttagacttcacacaaaaagaaatggaatttttacttaatttatctgaa 
gatcttaaacgcgcaaaatatgcaggaatagaacaacaaaaaatgaaaggtaaaaatatc 

20 gctctactttttgaaaaagattcaacacgcactcgatgtgcatttgaaacagcggcttat 
gatcaaggtgcacatgtaacataccttgggccaacaggttctcaaatgggtaaaaaagag 
tctaccaaagatactgctcgtgttttaggtggagctgtccctttaggtattttattatca 
aaaacgcaacgcacagctaatgtggtattaacagttgctggcgtgcttcaaaccattcct 
actttggctg tgctagctatcatgattccaatatttggggtaggaaaaacaccagctatt 

25 gttgcattatttatctatgtattattaccaatt ttaaataatacagtattaggtgttaaa 
aatatcgataaaaatgtcattcaagctggtcaaagtatgggaatgactaaatttcaatta 
atgaaagatgtagaaatgcctttagctttaccacttattattagtggtattcgtctatca 
agtgtatacgtcattagttgggcaacactcgcaagttatgtaggtgcaggtggacttggg 
gatcttgtatttaatggattaaatctctatcaaccacctatgattattagtgcagcgatt 

30 gttgttactttattagcattagttattgactttatactttcattagttgaaaaatgggtt 
gtacctaaaggattaaaagtatctagataa 

Sequence 64 

MNKILIEKEI FKMKNLRNRNFLTLLDFTQKEMEFLLNLSEDLKRAKYAGIEQQKMKGKNI 
35 ALLFEKDSTRTRCAFETAAYDQGAHVTYLGPTGSQMGKKESTKDTARVLGGAVPLGILLS 
KTQRTANVVLTVAGVLQTIPTLAVLAIMIPIFGVGKTPAIVALFIYVLLPILNNTVLGVK 
NI DKNVIQAGQSMGMTKFQLMKDVEMPLALPLI ISGIRLSSVYVISWATLASYVGAGGLG 
DLVFNGLNLYQPPMI ISAAI VVTLLALVI DFILSLVEKWWPKGLKVSR* 

40 Sequence 65 

Contig_04 4 5_pos_24 34_1541, 

is similar to (with p-value 7.0e-85) 

>gp:gp| AF009352 |AF009352_4 Bacillus subtilis osmoprotectant 
transport system OpuC including ATPase (opuCA) , transmembran 
45 e protein (opuCB) , osmoprotectant binding protein precursor 
(opuCC) and transmembrane protein (opuCD) genes, complete cd 
s. NID: g2271388. 

gtgttatctggatgcagtttaccaggtttaggtgatggaaatgcaaaagatgatgtgaaa 
atcacaacgactgaaacaagtgaaactaagattataggtcatatggaaaaattat taatt 

50 gaacatgaaactgatggaaaaatcaaaccgacgttgattgggaacctaggttctagcatt 
attcaacataatgcgttacaacgtggtgatgcaaatatgtcagcggtacgttacacaggt 
actgaattgacgagtgtattagcagctaaacctactaaagatcctgataaggccatgtct 
gaaacacaacgcttatttaaaaagaaatatgatgaaaagtattatcattcacttgggttt 
gcgaatacatacgcattcatggtgacaaaagaaacggctaaaaagtatcacttagaaaaa 

55 gtatcagatttagagaaatataaagatgaactacgtcttggaatggatacccaatggatg 
aaccgtgcaggtgatggatatccagcttttgttaaagattatggatttaaatttgatagt 
gcacgtccaatgcaaattggtttagtatatgatgcattaaaaaataataaattagatgta 
gcagttgggtattcaacagatggacgtattgcagcttatgatttgaaaatattggaagat 
gatcgcaaattcttcccgccttatgacggtagtccacttgcaaatgaacaattaataaag 
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gataatccagaaatcgataaggcactaaaaaaattagaaggtaaaatctcaacagaagaa 
atgcagaagttaaattatgaagcggatggaaaaggtaaggaacctgcagtgatagctgaa 
gaatatttaaagaaacatcattactttgaagaaaagaaaggtggtcataagtaa 

5 Sequence 66 

VLSGCSLPGLGDGNAKDDVKITTTETSETKIIGHMEKLLIEHETDGKIKPTLIGNLGSSI 
IQHNALQRGDANMSAVRYTGTELTSVLAAKPTKDPDKAMSETQRLFKKKYDEKYYHSLGF 
ANTYAFMVTKETAKKYHLEKVSDLEKYKDELRLGMDTQWMNRAGDGYPAFVKDYGFKFDS 
ARPMQIGLVYDALKNNKLDVAVGYSTDGRIAAYDLKILEDDRKFFPPYDGSPLANEQLIK 
10 DNPEIDKALKKLEGKISTEEMQKLNYEADGKGKEPAVIAEEYLKKHHYFEEKKGGHK* 

Sequence 67 

Cont ig_04 4 5_pos_l 4 5 4_8 4 6, 

is similar to (with p-value 9.0e-30) 

15 >gp:gp| AF009352 | AF009352_5 Bacillus subtilis osmoprotectant 
transport system OpuC including ATPase (opuCA) , transmembran 
e protein (opuCB) , osmoprotectant binding protein precursor 
(opuCC) and transmembrane protein (opuCD) genes, complete cd 
s. NID: g2271388. >gp : gp I Z99121 I BSUB0018_66 Bacillus subtili 

20 s complete genome (section 18 of 21) : from 3399551 to 360906 
0. NID: g2635827. 

atgtcggtatatggtgtgttgtttgcatgtataattggaattcctattggtattttcata 
gccaagtataaacgtttatcgtggccggtaattacaattgcaaatattatacaaactgtt 
ccagcaatcgctatgttagccatacttatgttggctatgggattaggaccaacaactgtt 

25 gttgtaactgtattcctatattcgttattacctattattaaaaatacttatactggtatt 
gtagaagttgatgaaaatattaaagacgctggtaaaggtatgggaatgacggggaatcaa 
atattaagaatgatagagttaccattatctttatctgttattattggtggtgttagaatt 
gcacttgttgttgctatcggaatagtagcgattgggtcatttatcggtgctccaacacta 
ggtgatattattattcgtggtacaaattcaacagatggaacaacattcatcttagcaggt 

30 gccataccaat tgctttaatagcaattatcatagatataggattacgttatctagaaaaa 
cgt ttagatcctactcgtaaaaacaaaaaagattcaatgcaaaaacatcaagtacaaaaa 
ttacgttaa 

Sequence 68 

35 MSVYGVLFACI IGIPIGIFIAKYKRLSWPVITIANI IQTVPAIAMLAILMLAMGLGPTTV 
VVTVFLYSLLPIIKNTYTGIVEVDENIKDAGKGMGMTGNQILRMIELPLSLSVIIGGVRI 
ALVVAIGI VAIGSFIGAPTLGDI IIRGTNSTDGTTFILAGAIPIALIAI IIDIGLRYLEK 
RLDPTRKNKKDSMQKHQVQKLR* 

40 Sequence 69 

Contig_04 4 6_pos_52 0_1 677 , 

is similar to (with p-value 4.0e-36) 
. >gp:gpl AF008930 I AF008930_4 Bacillus subtilis choline transpo 

rt system including ATPase (opuBA) , transmembrane protein (o 
45 puBB) , choline binding protein precursor (opuBC) and transme 

mbrane protein (opuBD) genes , complete cds; and unknown gene 

. NID: g3068551. >gp: gp | Z99121 | BSUB0018_57 Bacillus subtilis 
complete genome (section 18 of 21) : from 3399551 to 3609060 

. NID: g2635827. 

50 atgaaaccacttagaagattgaccaaagtcgaactccctattgcaatgcctgttatcatg 
gcaggaatacgcacagctatggtattaatcattggtactgctacactcgcagctttaata 
ggcgctggtggtctaggagatttaatattattaggcattgatcgtaacaatagtgcactc 
attttaataggtgctattccagctgcacttctagctattatttttgattttattttaaga 
tacatggaacgtttatcatataaaaaattgctcatttctttagggacaattgtaattgtg 

55 attatcatagctattgccatacctatggcagcgcaaaaaggtgataaaatcacattcgca 
ggcaagctaggttcagaaccgtcaattattacgaatatgtataaaatacttattgaagaa 
gacacagatgatactgtagaagtcaaagatggcatgggtaaaacctcattcttatttaat 
gcgcttaagtcagatgaaat tgatggttatttagaatttacaggtactgtattaggtgaa 
ttaacgaaagaagatttaaagtctaaaaaagaaaacgatgtatatcaacaagcaaagtct 
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agtttagaaaaaaaatatgatatgacaatgcttaaaccgatgaaatataataatacgtat 
gcattagctgtaaaacgtgactttgcaaaaaaatatcaaattaagacaataggtgattta 
cgcaaggtagaagataaacttaaaccaggttttacattggaatttaatgatagaccagat 
ggatacaaagctgttaaaaaaacgtatcatcttaatctttctaatgttaaaactatggaa 
5 cctaaattacgttatactgcagttaaaaagggagatattaatctcatagacgcatactct 
actgatgcagaattaaaacaatataacatggtagtattaaaagatgatcaacatgtattt 
cctccataccaaggagcaccgctatttaaagaaaaatatttaaaagaccatcctgaagtt 
aaaaaaccgctcaataaattggcgaatagaatcacagatgaagaaatgcaagaaatgaac 
tataaggtaacagtgaagaaagaggatccttataaagtagcaagagaatacttagaaaaa 
10 gaaaaattaataaaataa 

Sequence 70 

MKPLRRLTKVELPIAMPVIMAGIRTAMVLIIGTATLAALIGAGGLGDLILLGIDRNNSAL 
ILIGAIPAALLAIIFDFILRYMERLSYKKLLISLGTIVIVI IIAIAIPMAAQKGDKITFA 
GKLGSEPSIITNMYKILIEEDTDDTVEVKDGMGKTSFLFNALKSDEIDGYLEFTGTVLGE 
LTKEDLKSKKENDVYQQAKSSLEKKYDMTMLKPMKYNNTYALAVKRDFAKKYQIKTIGDL 
RKVEDKLKPG FTLEFNDRPDGYKAVKKTYHLNLSNVKTMEPKLRYTAVKKGDINLIDAYS 
TDAELKQYNMVVLKDDQHVFPPYQGAPLFKEKYLKDHPEVKKPLNKLANRITDEEMQEMN 
YKVTVKKEDPYKVAREYLEKEKLIK* 

Sequence 71 

Con t ig_0 4 4 6_pos_32 0 0__2 4 66, 
is similar to (with p-value 3.0e-77) 

>sp:sp|Q0617 4 |EST_BACST CARBOXYLESTERASE PRECURSOR (EC 3.1.1 
.1). >pir:pir IJC1374 | JC1374 carboxylesterase (EC 3.1.1.1) - 
Bacillus stearothermophilus (strain IFO 12550) >gp : gp | D1268 1 
I BACPBH7_1 Bacillus stearothermophilus esterase gene. N1D: g 
216313. 

atgcaaattaaactaccaaaaccattcttttttgaagaagggaaacgtgcagtgttactt 
cttcacggct ttacaggtaactctgctgatgtaagacaacttgggcgttatcttcaaaaa 
aagggctatacatcttatgctccacaatatgaaggacatgcagcgcccccagaagaaata 
ttaaaatctagcccttttgtttggtttaaagatgttttagatggttatgattatttagta 
gatcaaggttacgaagaaatagcagtagctggtttatcattaggtggcgccttcgcatta 
aaactaagtttaaatcgtgatgtgaaggggattataactatgtgtgcacctatggagaat 
aaaacagaaggttcgatttatgaaggctttcttgaatatgcacgtaactttaaaaaatat 
gaaggcaaagatcaacaaacgattgatcaagaaatggaacaatttcatccaactgaaacc 
ctgagagaactgagtgacactctaaatggagttaaagaacatgtcgatgaagtaattgat 
ccaatacttgtcgtacaagcagaacaagatacaatgattgatcctcaatcagcaaattat 
atatataatcatgtcgattctgatgaaaaagaaatcaaatggtatcaacattcaggtcat 
gtgattaccattgataaagggcattgtcaaaatcctcctgtaattttagcgtatctggaa 
aatcatcagcagtaa 

Sequence 72 

MQIKLPKPFFFEEGKRAVLLLHGFTGNSADVRQLGRYLQKKGYTSYAPQYEGHAAPPEEI 
45 LKSSPFVWFKDVLDGYDYLVDQGYEEIAVAGLSLGGAFALKLSLNRDVKGIITMCAPMEN 
KTEGSI YEGFLEYARNFKKYEGKDQQTIDQEMEQFHPTETLRELSDTLNGVKEHVDEVID 
PILWQAEQDTMIDPQSANYIYNHVDSDEKEIKWYQHSGHVITIDKGHCQNPPVILAYLE 
NHQQ* 

50 

Sequence 73 

Cont ig_04 4 6_pos_94 7_4 68 r 
putative peptide of unknown function 
55 gtgtcttcttcaataagtattttatacatattcgtaataattgacggttctgaacctagc 
ttgcctgcgaatgtgattttatcaccttt ttgcgctgcca taggta tggcaa tagctatg 
ataatcacaattacaattgtccctaaagaaatgagcaattttttatatgataaacgttcc 
atgtatcttaaaataaaatcaaaaataatagctagaagtgcagctggaatagcacctatt 
aaaatgagtgcactattgttacgatcaatgcctaataata ttaaatctcctagaccacca 
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gcgcctattaaagctgcgagtgtagcagtaccaatgattaataccatagctgtgcgtatt 
cctgccatgataacaggcattgcaatagggagttcgactttggtcaatcttctaagtggt 
ttcattccaatgcctttagccgcttcaataagagagggatcgacctccttaataccttaa 

5 

Sequence 7 4 

VSSSISILYIFVIIDGSEPSLPANVILSPFCAAIGMAIAMIITITIVPKEMSNFLYDKRS 
MYLKIKSKIIARSAAGIAPIKMSALLLRSMPNNIKSPRPPAPIKAASVAVPMINTIAVRI 
PAMITGIAIGSSTLVNLLSGFIPMPLAASIREGSTSLIP* 

10 

Sequence 75 

Contig_04 4 7jpos_18108_18 413, 
putative peptide of unknown function 

atgtcttctcatcctatccacattaatattttgatagattgtagagttaatttatctgga 
15 acacgtaaatcatcaaaaactacacctaaatgttctttatatttgaagtcgttctctgtc 
acctctttatcgaagaatttcagtgaaccattatctttatgtctatttcctacaagtgta 
ttaatgagggtagacttacctgaaccatttttcccaattaaacccactacctcgccaggt 
ttaacagagaatgtgatgtcagtaagctggaattctgaattct tatatgatttattgagt 
tgctga 

20 

Sequence 76 

MSSHPIHINILIDCRVNLSGTRKSSKTTPKCSLYLKSFSVTSLSKNFSEPLSLCLFPTSV 
LMRVDLPEPFFPIKPTTSPGLTENVMSVSWNSEFLYDLLSC* 

25 Sequence 77 

Contig_04 4 7_pos_22881_22129, 
putative peptide of unknown function 

gtgaggcaaatgtcacagtatccactttggaatcaattaaatactttaaaagaggctcag 
tgggtcgatttaacacatactttcgacccaaatatccctcgttttagcgaatttgaaaaa 

30 ggtgaagtctcaacgctattcaatgttaaagatcatgggttttatgtacaacgttggagt 
atcgtaactcaatatggaacacacattgatgctccaatccatttcgttgaaaatagaaga 
tatttagaagaattagatttaaaagaacttgttttaccattaattgttttagattattct 
aaagaagctgcacaaaattcagattttatcgtatcacgtaaacatttagaagattgggaa 
caacaacacggtcgcattgaagcaggtacttttgtcgcattacgtactgattggtcaaaa 

35 cgttggccagatatagaaaaatttgaaaataaagatgtagatggccatcaacatcttcca 
ggttggggccttgatgcattaaaatttctcattgaagaacgtggtgttaaatccataggt 
cacgaaacatttgatactgatgcctcaattgatacagctaaaaatggtgatattgttggc 
gaaagatatatcttaggtcaagacacattccaagtcgaattacttaccaatttagatcaa 
ttacctaccagaggtgcaattatctatgcaatcagcccaaaaccaaaagatgcaccaggc 

40 tttccagttcgtgcattcgcaataaaaccttaa 

Sequence 78 

VRQMSQYPLWNQLNTLKEAQWVDLTHTFDPNIPRFSEFEKGEVSTLFNVKDHGFYVQRWS 
IVTQYGTHIDAPIHFVENRRYLEELDLKELVLPLIVLDYSKEAAQNSDFIVSRKHLEDWE 
45 QQHGRI EAGT FVALRTDWSKRWPDI EKFENKDVDGHQHLPGWGLDALKFLIEERGVKS IG 
HETFDTDASIDTAKNGDIVGERYILGQDTFQVELLTNLDQLPTRGAII YAISPKPKDAPG 
FPVRAFAIKP* 



50 

Sequence 7 9 

Contig_04 4 7_pos_19018_18 668, 
putative peptide of unknown function 

gtgtatttatttatagtccaaaacgcgtcacttcactataaagctaaaattgacgcaaat 
55 atttcagatgatttagcagatacatatgaaaataaatcatacatcaaatcattgaaagta 
agatttatttacacaatgcaattaattgtcgcttttattgcaattttaatacccgtcata 
ggaaatgcatctgagaatcacatcgctctaataatgattcctt teat tat tacaat cat t 
tcatccataatgattgggatattttatagaaaatttgatgctcgataccctaaattagga 
gagaaacgttacactgaaaaagcatttaatattatggacgaaggagagtga 
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Sequence 80 

VYLFIVQNASLHYKAKIDANISDDLADTYENKSYIKSLKVRFI YTMQLIVAFIAILIPVI 
GNASENHIALIMIPFIITIISSIMIGIFYRKFDARYPKLGEKRYTEKAFNIMDEGE* 

5 

Sequence 81 

Contig_04 4 7_pos_18433_18116, 
putative peptide of unknown function 

atggaaaacttattagaagttcagcaactcaataaatcatataagaattcagaattccag 
10 cttactgacatcacattctctgttaaacctggcgaggtagtgggtttaattgggaaaaat 
ggttcaggtaagtctaccctcattaatacacttgtaggaaatagacataaagataatggt 
tcactgaaattcttcgataaagaggtgacagagaacgacttcaaatataaagaacattta 
ggtgtagtttttgatgatttacgtgttccagataaattaactctacaatctatcaaaata 
ttaatgtggataggatga 

15 

Sequence 82 

MENLLEVQQLNKSYKNSEFQLTDITFSVKPGEVVGLIGKNGSGKSTLINTLVGNRHKDNG 
SLKFFDKEVTENDFKYKEHLGVVFDDLRVPDKLTLQSIKILMWIG* 

20 Sequence 83 

Contig_04 4 7_pos_18109_17 288, 
putative peptide of unknown function 

atgttcaagaacaacaaaagtattgaagatacttatgcaacaaaacctattattcagaat 
atcgttggtcaggcacaaatcaaacaagtgatggcgaaacaaacacccatgagatatacg 

25 ttgaaagctatcatggctggttttctattatcaatagttacagtttttatgttagcaatt 
aaaacacaattcgcttcaacgcataatgacgggttaatcaatttgatgggagctattgcg 
tttagtt taggtctcgtattagttgtgttaaccaattctgaattattaactagtaatttt 
atgtatctgactgttggttggtattataaagcaattagtgtaagtaaaatgatatggatt 
tttattttctgttttataggtaatatcttaggtggatttattttatttttcctcatgaaa 

30 tatgcacatgttatgacgccagaaatgacagatagtttaacagcattagtacataaaaaa 
acagtagaatcgacttggttaaatattttgattaaaggtatattttgtaatttctttatt 
aatatcggtatttttatttcaatgcagtttaaagagggactagccaaagcattctttata 
gcttgtggagtgattgtctttgtatttatgggttacgaacacgttgtttttaacgctgga 
ttatatgcaggtatgatgttctttaatatggatggattatcttggttgggtgtgctaaaa 

35 aatattgtttttgcattccttggaaactatatcggtggaggtatctttattggat tagtg 
tatgcatatttgaacggtaaacgtgacagcctccaaccatag 

Sequence 84 

MFKNNKSIEDTYATKPIIQNIVGQAQIKQVMAKQTPMRYTLKAIMAGFLLSIVTVFMLAI 
40 KTQFASTHNDGLINLMGAIAFSLGLVLWLTNSELLTSNFMYLTVGWYYKAISVSKMIWI 
FI FCFIGNILGGFILFFLMKYAHVMTPEMTDSLTALVHKKTVESTWLNILIKGI FCNFFI 
NIGIFISMQFKEGLAKAFFIACGVIVFVFMGYEHVVFNAGLYAGMMFFNMDGLSWLGVLK 
NIVFAFLGNYIGGGIFIGLVYAYLNGKRDSLQP* 

45 

Sequence 85 

Contig_04 4 7_pos_l 6854_1 6309, 

is similar to (with p-value 2.0e-31) 

50 >sp:sp|PS4 951|YXEL_BACSU HYPOTHETICAL 19.0 KD PROTEIN IN IDH 
-DEOR INTERGENIC REGION. >gp : gp | Z99124 I BSUB002 1_55 Bacillus 
subtilis complete genome (section 21 of 21): from 3999281 to 
4214814. NID: g2636442. >gp : gp I D4 5912 | D4 5912_15 Bacillus su 
btilis genome sequence between the iol and hut operon, parti 

55 al and complete cds . NID: gl408482. 

gtgaaaaatatgtcgaatattaatattagagtggcacatgaacaagatgctgaagaatta 
catagcatcatgcaaattgcttttacacctttaagagaactaggtat tgattggccatca 
gttcacgctgatcttgaaatggtaaaggataatttaagacaaaatactacatt tgtactt 
gaaaatgaaaaagaaattatttcaacgattacggtttgctatgcatggagtagtgtaaaa 
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cccatttcaggttatccgttcgtttggtggtttgcaacacgaccaacttatgatggacaa 
gggtatgggagtcaacttttaaaatatgtagaggagacatttttacgcgatactttaaaa 
gctgctgcggtaaccttaggaacatcagcacgtttgcacccttggttattaaacatttac 
gaaaagcggggttatgaaatatacgctaaacatgaaaatgatgatggtgatttaggagtc 
5 ataatgcgtaaaattttaataccagaacaatttaatgatgacattttgggccgaccgcca 
ttttag 

Sequence 86 

VKNMSNINIRVAHEQDAEELHSIMQIAFTPLRELGIDWPSVHADLEMVKDNLRQNTTFVL 
10 ENEKEIISTITVCYAWSSVKPISGYPFVWWFATRPTYDGQGYGSQLLKYVEETFLRDTLK 
AAAVTLGTSARLHPWLLNI YEKRGYEIYAKHENDDGDLGVIMRKILI PEQFNDDILGRPP 
F* 

Sequence 87 
15 Contig_04 4 7_pos_13818_13129, 

putative peptide of unknown function 

atgagaaaaggaaatcagaatgaagctttagaagaatttatcggaactttat taaaagat 
gagcaatattattatgagttagcatttttagaaagtgaaacacaaaatcttgaaatcata 
atggagaagatgattaagcaaggaattacaaaatttcgtattgtacctttactcattttt 

20 agtgcaatgcattatatcagtgatattccacaaatacttaaagagatgaaagctcgatat 
ccacaaattgatagtaaaatgagtgcgcctcttggtacacatccatatatgaaaacatta 
gtagaaaatagaattgctgatgaaaaagtcagtgaaggttcaaccaaagcaactatagta 
attgcccatggaaatggaagtggacgttttacgaaagcacatgatgaattaaaagcattt 
gttaaaacgcttgatagtcatcatcctgtttatgcaagagctttatatgggacattagca 

25 tttaaaaatgatttagataaaatctcagagcaatatgacgagttagtcattgtcccatta 
tttttatttgatggtagattggtgaataaagtaaaacgtcttttaggtgaaatgacattg 
catagtcaattacacattacgccatcgattaactttgatccaattttaagat taattatt 
agagaaagacttgaagcgttagatatttaa 

30 Sequence 88 

MRKGNQNEALEEFIGTLLKDEQYYYELAFLESETQNLEIIMEKMIKQGITKFRIVPLLIF 
SAMHYISDIPQILKEMKARYPQIDSKMSAPLGTHPYMKTLVENRIADEKVSEGSTKATIV 
IAHGNGSGRFTKAHDELKAFVKTLDSHHPVYARALYGTLAFKNDLDKISEQYDELVIVPL 
FLFDGRLVNKVKRLLGEMTLHSQLHITPSINFDPILRLI IRERLEALDI * 

35 

Sequence 8 9 

Contig_04 4 7_pos_12691_1028 6, 

is similar to (with p-value 0.0e+00) . 

>sp:sp|P42435|NASD_BACSU NITRITE REDUCTASE (NAD(P)H) (EC 1.6 
40 .6.4). >gp:gpl D3068 9 I BACNARB_4 Bacillus subtilis DNA around 
narB region (nasB operon and nasA gene). NID: g71O016. >gp:g 
p| Z99105|BSUB0002_159 Bacillus subtilis complete genome (sec 
tion 2 of 21): from 194651 to 415810. NID: g2632457. >gp:gpl 
D50453|D50453_33 Bacillus subtilis DNA for 25-36 degree regi 
45 on containing the amyE-srfA region, complete cds . NID: gl805 
369. 

atggcaaaacaaaaacttgtaatgattggtaatggtatggcaggtttaagaacgatagaa 
gagattttagaacgttcacaatcacaatttgatattactattattgggaaagaaccttat 
ccgaactataacagaattatgttatccaatattttacagaagaaaatgaccgtcgaagat 

50 acaattatgaatccttatgattggtatcaagagaataatattgaacttataaataatgat 
ccagtggaaaaagttgataaagaaaacaaaatagttactacttctaaaggtattgaagta 
gagtatgacatttgtattttcgctactggatcaaaagcttttgtattacctataccaggt 
tcaaatcttcctagtgtcattggatggcgaacaattgatgatacaaataaaatgattgaa 
attgcccaaacgaaaaaacgcgcagttgtcattggtggaggtcttctaggcttagagtgt 

55 gccagaggacttctagatcaaggaatggaagtgacagt tcttcatttagctgattggctc 
atggaaatgcaattggatcgtaaagccggagaaatgcttaaagcggatttagaaaagcaa 
ggtatgaagattgaacttcaagcaaattctaaagaaatcattggtgataaagatgttgaa 
gctattaaattagctgacggtcgggtgattgaaacagatttagtagttatggctgttggt 
atcagaccttatactgaagttgctaaagatagtggattagatgtcaatagaggtattgtt 
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gtaaatgattatatgcaaacatctgattctcatatttatgcagtcggtgaatgtgccgaa 
catgatgggaaagtttatggattggtggcgccactttatgaacaaggcaaagtgctagca 
gattatttaactggtaaagaaacaaaaggttataaaggatctactactttcacttcactt 
aaagtatctggttgtgatttatatagtgcagggcaaattgttgaagatgaagatgtccat 
5 ggtgtggaaatttttaatagtgtcgacaatatctacaaaaaagtgtatttaagtcagggt 
caagtcgttggtgctgtcttgtatggtgatactgatgatggatcacgattttataatatg 
atgaaaaaacatgaaacgcttgaagattatacacttgtttctttattgcataaaggtgat 
gaagatgcggggacatctattgctgatatgtctgatgatgaaacgatttgtggatgtaat 
ggtgttgataaaggaacaatcgtcaatgctattacaagtaaaggtttaacgtctgtagat 

10 gaagtgactaaagcaacaaaagcaggtaattcatgtggtaagtgtaaaggtcaaatcggt 
gagttattacaatatacattaggtgacgactttattgctgcaaaaccaacaggtatttgt 
ccatgtactgatttaacaagagaccaaattgtaactcaaatcagggctaaaaatct caaa 
tcatcaaaagaagtacgacacgttcttgatttcaaagataaagatggttgtcctaaatgt 
cgacctgcaattaattattatttaaatatggtttatccttttgaacatcgagacgaaaaa 

15 gattctcgcttcgctaatgaaagatatcatgcaaatatacaaaatgatggtactttctca 
gtgattcctcaaatgcgcggtggtgttacagatgctgaccaactcattcgattaggagaa 
gttgctaaaaagtataacgtaccacttgttaaagtaacaggttcgcaacgtgtaggttta 
tatggattgaagaaagaagaattaccacaagtttggaaagatttaggaatgcgttctgct 
tctgcttatggtaaaaagacgcgttctgttaaaagttgcgttggtaaagagttttgtcgt 

20 tttggtacacaatacacaactcgactaggaataagacttgaaaaaacatttgaatatatt 
gatacacctcataaatttaaaatgggagtatcaggttgtccgagaagttgtgtagagtct 
ggtgttaaagattttggcgtcatatctgttgaaaatggctaccaaatatttatcggaggt 
aatggtggtactgatgttactgtaggtaaattgttaacgacagttgaaaccgaagatgaa 
gtgattcaattatgtggtgccctcatgcagtattacagagaaacaggtgtttacgctgaa 

25 agaacagcaccatggttagaacgtatgggctttgaaaatgtcaagaatgtcttattaaat 
caagaaaagcaaaaagaactgtatttaagaattatggaagccaaaaaagctgttgagaat 
gaaccatgggaaactattgttgaaaataaagaagcacaaaaaatctttgaagttgagaag 
gtgtaa 

30 Sequence 90 

MAKQKLVMIGNGMAGLRTIEEILERSQSQFDITIIGKEPYPNYNRIMLSNILQKKMTVED 
TIMNPYDWYQENNIELINNDPVEKVDKENKIVTTSKGIEVEYDICIFATGSKAFVLPIPG 
SNLPSVIGWRTIDDTNKMIEIAQTKKRAWIGGGLLGLECARGLLDQGMEVTVLHLADWL 
MEMQLDRKAGEMLKADLEKQGMKI ELQANSKEI IGDKDVEAI KLADGRVIETDLVVMAVG 

35 IRPYTEVAKDSGLDVNRGI VVNDYMQTSDSHI YAVGECAEHDGKVYGLVAPLYEQGKVLA 
DYLTGKETKGYKGSTTFTSLKVSGCDLYSAGQI VEDEDVHGVEI FNSVDNI YKKVYLSQG 
QVVGAVLYGDTDDGSRFYNMMKKHETLEDYTLVSLLHKGDEDAGTSIADMSDDETICGCN 
GVDKGTIVNAITSKGLTSVDEVTKATKAGNSCGKCKGQIGELLQYTLGDDFIAAKPTGIC 
PCTDLTRDQIVTQIRAKNLKSSKEVRHVLDFKDKDGCPKCRPAINYYLNMVYPFEHRDEK 

40 DSRFANERYHANIQNDGTFSVIPQMRGGVTDADQLIRLGEVAKKYNVPLVKVTGSQRVGL 
YGLKKEELPQVWKDLGMRS ASAYGKKTRSVKSCVGKEFCRFGTQYTTRLGI RLEKT FEY I 
DTPHKFKMGVSGCPRSCVESGVKDFGVI SVENG YQI FIGGNGGTDVTVGKLLTTVETEDE 
VIQLCGALMQYYRETGVYAERTAPWLERMGFENVKNVLLNQEKQKELYLRIMEAKKAVEN 
EPWETIVENKEAQKI FEVEKV* 

45 

Sequence 91 

Contig_04 47_pos_10283_9969, 

is similar to (with p-value 3.0e-22) 

>sp:sp| P42436|NASE_BACSU ASSIMILATORY NITRITE REDUCTASE (NAD 
50 (P)H) SMALL SUBUNIT (EC 1.6.6.4). >gp : gp I D30689 | BACNARB_5 Ba 

cillus subtilis DNA around narB region (nasB operon and nasA 
gene). NID : g710016. >gp : gp | Z99105 I BSUB0002_158 Bacillus su 

btilis complete genome (section 2 of 21): from 194651 to 415 

810. NID: g2632457. >gp : gp | D504 53 I D504 53_32 Bacillus subtili 
55 s DNA for 25-36 degree region containing the amyE-srfA regio 

n, complete cds . NID: gl805369. 

atgaaagctaaagaaaagat taaagttacaacaatgaatgaaatgattcctcaaa taggc 
aaaaaagtagttgtaaacgaaaaagaaataggtattt t tctcacagataatggtgattta 
tatgccattggaaatatatgtccacataaagaaggaccgttgtctgaagggactgtaagt 
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ggtgattatgtttactgtccgttacacgatcaaaaaatagctttaaaaactggagaagta 
caacaacctgatacaggatgtgtagagacatacgaagtagaagttattgatggagatatt 
tacttatgtctataa 

5 Sequence 92 

MKAKEKIKVTTMNEMIPQIGKKVVVNEKEIGIFLTDNGDLYAIGNICPHKEGPLSEGTVS 
GDYVYCPLHDQKIALKTGEVQQPDTGCVETYEVEVIDGDI YLCL* 

Sequence 93 
10 Contig_04 4 7_pos_987 6_904 3 , 

is similar to (with p-value 3.0e-32) 

>sp:sp| P29928 |SUMT__BACME UROPORPHYRIN-I I I C-METHYLTRANSFERAS 
E (EC 2.1.1.107) (UROGEN III METHYLASE) (SUMT) (UROPORPHYRIN 
OGEN III METHYLASE) (UROM) . >pir : pir I A4 24 7 9 I A4 24 7 9 S-adenosy 
15 1-L-methionine uroporphyrinogen III methyltransf erase - Baci 
llus megaterium >gp : gp I M6288 1 1 BACCOBA_l Bacillus megaterium 
S-adenosy-L-methionine : uroporphyrinogen III methyltransf eras 
e (COBA) gene, complete cds . NID: gl42694. 

gtgattttatttgatcgtctcgtaaatcctttcatcttacagtatgcttcttctcaaaca 
20 aaagtgatcaatgtgggaaagaaaccttattgtaaacacattcaacaagaggagattaat 
caaaaaattgttgaagcagctaatcaatatcaatgtgtggtgagactaaagggaggagat 
cct gcga t t tttggtagaattacagaagaagtacaaacattagaaaat cat ca tat teat 
tacgagattgtccctggtgtgacatcagcaagtgctgccgtagcaactatgaatatggga 
ttaacgatgcgttctatcgcaccgagtgtgactttctcaactggtcat tttaaagattcg 
25 gttaatcacgatacggatattaggaacttgattaatggaggcactttagctatttatatg 
ggtgtgaaaagattaggtcaaattattaaacaaattgaatcatatacgaatgaagactac 
cccattgcaatagtgtttaatgcttcctgctacaatgaaaagattgttataggtcattta 
agtacgattgaagaacaattggtttctcaacaactagaaggtcatccaggcatatgcatt 
ttaggtaatatacttgatgacattaatcgtacgttattgaataataataagaatgacaag 
30 ggaaatctatatttaatcaagggagataaagaacgtgcaattgcaaaggctgaaacttta 
tatgatgaaggaatccaatgtctgattgattttgaccatagctaccacatttctcaacaa 
aacgtgtataacgaaatgattaaacacaagagtattaaaacaatatatgtataa 

Sequence 94 

35 VILFDRLVNPFILQYASSQTKVINVGKKPYCKHIQQEEINQKIVEAANQYQCVVRLKGGD 
PAIFGRITEEVQTLENHHIHYEIVPGVTSASAAVATMNMGLTMRSIAPSVTFSTGHFKDS 
VNHDTDIRNLINGGTLAI YMGVKRLGQI IKQIESYTNEDYPIAIVFNASCYNEKIVIGHL 
STIEEQLVSQQLEGHPGICILGNILDDINRTLLNNNKNDKGNLYLIKGDKERAIAKAETL 
YDEGIQCLIDFDHSYHISQQNVYNEMIKHKSIKTI YV* 

40 

Sequence 95 

Con t i g_0 4 4 7_pos_8 8 6 4 _5 1 8 1 , 

is similar to (with p-value 0.0e+00) 

>gp:gp| AF029225 I AF029225_1 Staphylococcus carnosus NarG, Nar 
45 H, Nar J, and Narl genes, complete cds . NID: g3929521. 

atgggaaaatttggattgaatttctttaaaccgacagaaaagtttaatggaaattggtcg 
gtattagagcataaaagtcgagaatgggaaaagatgtatagagaaagatggagccacgac 
aaagttgtgagaacgacgcatggtgttaactgtactggatcatgttcatggaaagtattt 
gtcaaaaatggcgtaattacatgggaaaatcaacaaattgattatccaagttgtggacct 
50 gatatgccagaatttgagccaagaggttgtccgagaggtgcatcattttcttggtatgag 
tatagtccgttaagagttaaatatccttatattagaggtaaattattagatttatggacc 
gaagcgcttgaagaacaaaaaggaaaccgaat tgcggcatgggcatccatcgtagaaaat 
gaagaaaaagccaaacaatataaagaagcaagaggtaaaggtggacacgtcagagcaaat 
tggaaagatgccacagatatcattgcagctcaaattttatacaccataaaaaaagatgga 
55 ccggatcgtattgctggatttactcctattcctgctatgtcgatgattagttatgcttca 
ggagcaagatttattaatttgttaggtggagaaatgttaagtttttacgattggtatgct 
gatttaccacctgcatctccacaaatttggggtgagcaaacagacgtgccagaatccagt 
ga t t ggt acaa cgect ca ta ca t aa t ga t g tgggga tcaa aegt t cca ttaaca eg t a ca 
cctgacgcacattttatgactgaagttagatataaaggggcgaaagttatttcagtagca 
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cctgattatgctgagaatgttaagttcgccgatcattggcttgcaccacatccggggaca 
gatgcagcggttgcacaagcaatgacacatgttattttacaggaatattatgaaaatcaa 
ccgaatgatatgtttattaactatgctaagcaatattctgatatgccgtttgtcattatg 
'ttagatgaagatgagaatggctataaagcaggtagattcttgcgtgcttctgatttaggg 
5 atgtcaggtgaaaataatgaatggaagccagttattcaagacaaattgagccaacaatta 
cttgttcctaatggcacaatggggcaacgctgggaagaagggaaaaaatggaatttgaaa 
cttgaaacagaggatggtacaccaattgatccaatgttatcaatggttgaaagtgactat 
catgttgaaacgattcaatttccatattttgatagcagtggtgatggtatctttgagaga 
cctattgcaacgagaactattcagttagctaacggagaagaagttaaaattgctacggtt 

10 tatgatttaatgacgagtcaatatggtgttcaacgttttgaacacgaactagaagctaca 
tcttatgatgacgcatcttctaaatatactcccgcttggcaagaacaaattacaggtatc 
aaaaaagaattagtgacgaaagtggcaaaagaatttgcacaaaatgctattgatactggt 
ggacgctcaatgattattatgggggctggtatcaaccattggtttaactccgatactatt 
tatcgttcaattcttaacttagtactattgtgtggttgtcaaggcgttaacggtggtggt 

15 tgggcacactatgtaggacaagaaaaatgtcgaccaattgaaggatggaatactattgca 
ttcgctaaagattggcaaggtcctccacgtttacaaaatggtacaagttggttctatttc 
gctacagatcaatggaagtatgaagaatcaaatgtagataaattaaaatcaccattagct 
gaaaatattaagcatcaacatccagctgattacaatgtaacagctgctcgtatgggctgg 
ttgccttcatatccacagt ttaataaaaacagtctattatttggtgaagaggctaaagat 

20 gaaggtgatgattcaaatgaagccatcttacaaaaagcgattgaatcagttaaaaataaa 
gatacacaatttgcgatagaagatccagatttaagaaaaaaccatcctaaaacattattt 
gtatggagatctaatttaatttctagttcagctaaaggacaagaatactttatgaagcac 
ttgttaggtgcgcgctctggtttaatggcagagccaaatgaagatgataaaccagaggaa 
attaaatggcgcgaggatacagaagggaaacttgatttattagtatcacttgatttcaga 

25 atgactgcgacgccattatattcagatatcgttttacctgctgcaacttggtatgaaaaa 
catgatttatcttctacagacatgcatccatttattcatccatttaacccagcgattgac 
ccattatgggaatcgcgttcggactgggatatttataaaactctaagtaaagctgtttca 
gaaatggcgaaagattatcttccaggtaaatttaaagatgtcgtaactacaccattagga 
cat gat tcaaaacaagaaatttcaactgaatacggtattgtaaaagattggtctaaagga 

30 gaaattgaaggtgtgccaggtaaaacaatgcctaatttttctatcgtagagcgagactat 
acacaaatttacgataaattcgttactgttggtccaaaactagaaaaagggaaaataggt 
gctcatggtgtgagttatagcgttagtgaagagtacgaagaacttaaaagtatagttgga 
acttggaatgatgataatactatttcagttaaaaatgatagaccgagaatagatacagcg 
agaaaagtagcagatgtcattttgaatatatcctctgctacaaacggcaaattatcacaa 

35 aagtcatatgaagatttagaaaatcaaacaggtatggaacttaaagatatttctaaagaa 
cgtgcttctgaaaagatatcattcttaaacattacttctcaaccaagagaagtgattcca 
actgcagtattccctggctctaataaagatggaagacgctactcaccgtttacaactaat 
gttgaacgtttagtgccatttagaacactaactggacgtcaaagttattatatagatcat 
gaggtattccaacagtttggcgaaagtttaccggtatataaacctactttacctccaatg 

40 gtatttggtgctcgtgataaaaaagttaaaggtggacaagatacattagtgcttcgatac 
cttacacctcatggaaaatggaatattcattcaacttatcaagataatgaacgcatgttg 
acgttgtttagaggtggaccagttgtatggatttcaaatgaagacgcagctgaccatggt 
attaatgataacgactggttagaagtatacaacagaaacggagttgttactgccagagct 
gtaacatctcatcgtatgcctagaggcacaatgtttatgtatcatgcacaagataaacat 

45 atagagacacctggttctgaaattactgatactcgtggaggttctcataatgcacctact 
cgtattcacttgaaacctactcaattagtaggaggatatgcacaaattagttatcacttt 
aactattatggaccaattggaaatcaaagagatgagtatgtagctgttagaaaaatgaag 
gaggtcaattggcttgaagattaa 

50 Sequence 96 

MGKFGLNFFKPTEKFNGNWSVLEHKSREWEKMYRERWSHDKVVRTTHGVNCTGSCSWKVF 
VKNGVITWENQQIDYPSCGPDMPEFEPRGCPRGASFSWYEYSPLRVKYPYIRGKLLDLWT 
EALEEQKGNRIAAWASIVENEEKAKQYKEARGKGGHVRANWKDATDIIAAQILYTIKKDG 
PDRIAGFTPIPAMSMISYASGARFINLLGGEMLSFYDWYADLPPASPQIWGEQTDVPESS 

55 DWYNASYIMMWGSNVPLTRTPDAHFMTEVRYKGAKVISVAPDYAENVKFADHWLAPHPGT 
DAAVAQAMTHVILQEYYENQPNDMFINYAKQYSDMPFVIMLDEDENGYKAGRFLRASDLG 
MSGENNEWKPVIQDKLSQQLLVPNGTMGQRWEEGKKWNLKLETEDGTPIDPMLSMVESDY 
HVETIQFPYFDSSGDGIFERPIATRTIQLANGEEVKIATVYDLMTSQYGVQRFEHELEAT 
SYDDASSKYTPAWQEQITGIKKELVTKVAKEFAQNAIDTGGRSMI IMGAGINHWFNSDTI 
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YRSILNLVLLCGCQGVNGGGWAHYVGQEKCRPIEGWNTIAFAKDWQGPPRLQNGTSWFYF 
ATDQWKYEESNVDKLKSPLAENIKHQHPADYNVTAARMGWLPSYPQFNKNSLLFGEEAKD 
EGDDSNEAILQKAIESVKNKDTQFAIEDPDLRKNHPKTLFVWRSNLISSSAKGQEYFMKH 
LLGARSGLMAEPNEDDKPEEIKWREDTEGKLDLLVSLDFRMTATPLYSDIVLPAATWYEK 
5 HDLSSTDMHPFIHPFNPAIDPLWESRSDWDIYKTLSKAVSEMAKDYLPGKFKDVVTTPLG 
HDSKQEISTEYGIVKDWSKGEIEGVPGKTMPNFSIVERDYTQIYDKFVTVGPKLEKGKIG 
AHGVSYSVSEEYEELKSIVGTWWDDNTISVKNDRPRIDTARKVADVILNISSATNGKLSQ 
KSYEDLENQTGMELKDISKERASEKISFLNITSQPREVIPTAVFPGSNKDGRRYSPFTTN 
VERLVPFRTLTGRQSYYIDHEVFQQFGESLPVYKPTLPPMV FGARDKKVKGGQDTLVLRY 
10 LTPHGKWNIHSTYQDNERMLTLFRGGPVVWISNEDAADHGINDNDWLEVYNRNGWTARA 
VTSHRMPRGTMFMYHAQDKHIETPGSEITDTRGGSHNAPTRIHLKPTQLVGGYAQISYHF 
NYYGPIGNQRDEYVAVRKMKEVNWLED* 

Sequence 97 
15 Contig_04 4 7_pos_5167_3638, 

is similar to (with p-value 0.0e+00) 

>gp:gp|AF029225 |AF029225_2 Staphylococcus carnosus NarG, Nar 
H, NarJ, and Narl genes, complete cds. NTD: g3929521. 
atggtattgaatctagacaaatgtattggttgtcatacttgcagtgtgacatgtaaaaac 

20 acatggacaaatcgacctggtgcagaatatatgtggtttaataacgtagaaacaaaaccg 
ggtgtaggatatccaaaaagatgggaagaccaaggacaatataaaggtggttgggtgcta 
aataaaaaaggaaagcttgaattaaaatctggtaacagatggtcaaaaattgctttaggt 
aaaatcttctataatccagacatgccactcattcaagattattatgaaccgtggacatat 
aactatgaacacttaaccaatgctaaacaaggacagcactctcccgtggcgacagctcac 

25 tctttaatttcaggtgatagattgaatcttaaatgggggccaaactgggaagatgattta 
gctggaggtcacattacaggaccagaggatccaaatattcagaaaatagaagaagatatt 
aaattccaattcgatgagacatttatgatgtatttaccaagactatgtgaacactgttta 
aatccaagttgcgtagcatcttgtccatcaggagctatgtataaacgagatgaggatggt 
atcgtactcgtcgatcaagaagcctgtcgaggttggagatactgtatgactggatgtccg 

30 tataaaaaagtatattttaactggaaaacgaataaagctgaaaaatgtacattttgtttc 
ccacgaatcgaagctggtatgccaactgtttgttccgaaacttgtacaggacgtatgaga 
tatttaggtgttttattatatgacgcagatcgcgttcaagaagcggcttcagctaaagat 
gaaaaagacttatacgaaaaacaattagacctattccttgatccatttgatgaggaagtc 
attgcacaagctgaaaaagatggaataaatcaagaatggattacagcagctcaaaactca 

35 ccagtgtataaattggcaatagaatataaaatggcctttccattgcatcctgaatttaga 
actatgccgatggtgtggtattgtccacctttaagtcctattatgagttattt cgaaggt 
gaaaatgcaggtcaaaatccagata tgatt ttcccagctattgaggaaatgcgtttacct 
attcaatacttagcaaatttattaactgctggcgacacaaaacctgttaaagagggctta 
caaaaaatggcgatgatgagaagttatatgcgttctcaaataacaaaccaacctttcgat 

40 acttctaaattagaacgattaggacttactgaaagacagatgactgaaatgtatcgctta 
ctaggtattgctaaatatgaagatcgttttgttgtgccttcttcccataaagaaacatat 
ttagatacttataaagcgcaaggaagtcaaggttacggtggagagtactttggctctaat 
tgtgaaggttgtggtgttgcagttcaatcaggtaaaactggacaagaaatttataatgaa 
aatttctatggagggatcttccgtgattaa 

45 

Sequence 98 

MVLNLDKCIGCHTCSVTCKNTWTNRPGAEYMWFNNVETKPGVGYPKRWEDQGQYKGGWVL 
NKKGKLELKSGNRWSKIALGKIFYNPDMPLIQDYYEPWTYNYEHLTNAKQGQHSPVATAH 
SLISGDRLNLKWGPNWEDDLAGGHITGPEDPNIQKIEEDIKFQFDETFMMYLPRLCEHCL 

50 NPSCVASCPSGAMYKRDEDGI VLVDQEACRGWRYCMTGCPYKKVYFNWKTNKAEKCTFCF 
PRIEAGMPTVCSETCTGRMRYLGVLLYDADRVQEAASAKDEKDLYEKQLDLFLDPFDEEV 
IAQAEKDGINQEWITAAQNSPVYKLAIEYKMAFPLHPEFRTMPMVWYCPPLSPIMSYFEG 
ENAGQNPDMIFPAIEEMRLPIQYLANLLTAGDTKPVKEGLQKMAMMRSYMRSQITNQPFD 
TSKLERLGLTERQMTEMYRLLGIAKYEDRFVVPSSHKETYLDTYKAQGSQGYGGEYFGSN 

55 CEGCGVAVQSGKTGQEI YNENFYGGI FRD A 

Sequence 99 

Contig_04 4 7_pos_3585_3070, 

is similar to (with p-value 5.0e-62) 
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>gp:gp|AF029225 |AF029225_3 Staphylococcus carnosus NarG, Nar 
H, NarJ, and Narl genes, complete cds. NID: g3929521 . 
atgaattttccagaaaaaatgacatttcatccaaaaatatttgaagaaactatttctaag 
tctcaccccggctacgaagatttgcttgcatatagagaagtcatgatgaattatactttg 
5 tcagaaattaaagctatctatacagatacatttgattttagtaaaaaacacccactctat 
atgacatttaataaatttgacacgcaaaaggaacggggtcaaatgctagctaaattaaag 
gttttatacgaaatgtttggactaaaaatggttgataatgaattatctgattttctccca 
ttgatgctacagtttttgcaagttgctgattttaaaaatgatagtcgagcacaggaaaac 
cttcaacttgtcattatgattattgaagatggtacgtatgaaatggcaaataccctagct 
10 gaaaacaataatccctatgcatatgttgtcagtgcattaagaaaaacgttaaaagcgtgt 
atcgtgcctttgaaagaggtggaaaatcatgcttaa 

Sequence 100 

MNFPEKMTFHPKIFEETISKSHPGYEDLLAYREVMMNYTLSEIKAIYTDTFDFSKKHPLY 
15 MTFNKFDTQKERGQMLAKLKVLYEMFGLKMVDNELSDFLPLMLQFLQVADFKNDSRAQEN 
LQLVIMI IEDGTYEMANTLAENNNPYAYVVSALRKTLKACIVPLKEVENHA* 

Sequence 101 

Cont ig_04 4 7_pos_27 68 2 4 00 , 

20 is similar to (with p-value 2.0e-50) 

>gp:gp|AF029225|AF029225_4 Staphylococcus carnosus NarG, Nar 
H, NarJ, and Narl genes, complete cds. NID; g3929521. 
atgttgttgctaactttaagacgactatccatcaaaaacgttagacgattaagttcattt 
tcagatatatttgtgaatatcgttttgttgattattttaataatgggttgttattctacg 

25 cttgtaaccaatgcgattcaacctgaatttgattatcgtcaaaccattgcgatatggttt 
agacatttattcatgttttctccaaatgctgacttaatgttaaacgtgccttggtcgttt 
aaactgcacatattattagggtttacagtgtttgcgtgttggccatttactcgtttagta 
catgtttggagtgtaccactgtcttatatgaacagaagatatattgtttatcgcaaaaac 
aaaatttaa 

30 



Sequence 102 

MLLLTLRRLSIKNVRRLSSFSDI FVNIVLLI ILIMGCYSTLVTNAIQPEFDYRQTIAIWF 
35 RHLFMFSPNADLMLNVPWSFKLHILLGFTVFACWPFTRLVHVWSVPLSYMNRRYIVYRKN 
KI* 

Sequence 103 

Contig_04 4 7__pos_2341_1928, 

40 putative peptide of unknown function 

atgaatttagataagttgagagcacaagagggttatgattttggtggtatcgctttatat 
gattatcatcacacttcatcaccaattaaatggcaatatgtttcaggtaacacaaatgat 
a gat a taaacttatcattttgagaaagggtagagggcttgctggaatggtgatgaaaacc 
ggtaagcgtatggttattgctgatgtagatacagctttatctccagaagagaaagttaaa 

45 tttccaatcattcttagtgagtcattgacagctgtagttgcagtccctttatggttagaa 
aattcaatgtatggcgttttattattaggtcaaagaaatcatcagccgttacctcagtca 
ttggaccaacttaatattgaaaaacaaatcggtatttttacagaaataaactag 

Sequence 104 

50 MNLDKLRAQEGYDFGGIALYDYHHTSSPIKWQYVSGNTNDRYKLIILRKGRGLAGMVMKT 
GKRMVIADVDTALSPEEKVKFPI ILSESLTAVVAVPLWLENSMYGVLLLGQRNHQPLPQS 
LDQLNIEKQIGIFTEIN* 

Sequence 105 
55 Contig_04 4 7_pos_1927_8 84, 

is similar to (with p-value 1.0e-29) 

>sp:sp|P54 6631DEGS_BACBR SENSOR PROTEIN DEGS (EC 2.7.3.-). > 
gp:gp|L154 44 I BACDEGSU__1 Bacillus brevis protein kinase (degS 
) gene, complete cds; transcriptional activator protein (deg 
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U) gene, complete cds . NID: g710494. 

gtggtaaatatgttggagcaaactgatttaagtttagagcaattacttaagaattattat 
gaaaccacgaacgagaaaattgtatttgttaatagacaaggcaaaattattgctatgaat 
gacgcagcaaaagatattttaactgaggaagataattataatgctatgacaaatgcgatt 
5 tgtcatcgatgcgaaggatactctaatgaatatgatgtacaatcgtgtaaagattgtttt 
ttagagacaacgcaattacaacattccaatttccaagtatttatgaagacaaaagataat 
gaaattaagccttttacagctatgtatcaaaatattgatgaacaaagaggtattagtgca 
tttaccttacagaatgtggcgcctcagattgaaaggcaagaaaaaatgtatcaacaaaaa 
atgttacatcgttcaattcaagcacaagaaaatgaacgaaagcgtatttctagagaatta 

10 catgatagtgtaatacaggatatgctcaatatagatgttgaactaaggcttttgaagtat 
aagcacagggataaggtgttagctgaaacatctcaacgtatagaaggcttattatcacag 
cttattgatgatattagaaatatgtctgttgaattaagaccttcttctctcgacgattta 
ggcattgaagcagcttttaaatcatattttaaacagtttgaagaaaat tatggtatgcat 
attaaatatgattcgaacattaaaggcatgcgttttgataatgaaattgaaacagttgtg 

15 tatcgtgtagttcaagagggtgtatttaatgctctaaaatatgctgaggttaatgaaatt 
gaggtaagtacgcatagtgatggcaagcagcttgtagcagaggttgtggatcgaggtaaa 
gggtttagtttagatcatcaccctaaaggctctggacttggattgtacggaatgagagaa 
cgt gcagaat tagttaacggt cat gttaat at agagacacatat taatagaggt aetata 
attacattagatataccgatttaa 

20 

Sequence 106 

VVNMLEQTDLSLEQLLKNYYETTNEKIVFVNRQGKIIAMNDAAKDILTEEDNYNAMTNAI 
CHRCEGYSNEYDVQSCKDCFLETTQLQHSNFQVFMKTKDNEIKPFTAMYQNIDEQRGISA 
FTLQNVAPQIERQEKMYQQKMLHRSIQAQENERKRISRELHDSVIQDMLNIDVELRLLKY 
25 KHRDKVLAETSQRIEGLLSQLIDDIRNMSVELRPSSLDDLGIEAAFKSYFKQFEENYGMH 
IKYDSNIKGMRFDNEI ETVVYRVVQEGVFNALKYAEVNEIEVSTHSDGKQLVAEVVDRGK 
GFSLDHHPKGSGLGLYGMRERAELVNGHVNIETHINRGTIITLDIPI* 



30 

Sequence 107 

Cont ig_0 4 4 7_pos_8 5 5_1 9 9 , 

is similar to (with p-value 5.0e-42} 

>gp:gp|AL034446|SClA9_26 Streptomyces coelicolor cosmid 1A9. 

35 NID: g4007685. 

gtgaaaatagttatagcggatgaccatgcagttgttaggacaggattttcaatgatatta 
aattatcaagaagatatggaagttgttgcaactgcagctgacggggttgaagct tatcaa 
aaagtgttagaacatcgaccagatgttttaattttagatttgagcatgccgccaggagag 
tcaggcttaatcgcaaccagtaaaatttctgaaagttttcctgatactaaaattttaata 

40 cttacgatgtttgatgacgaagaatatttatttcatgtgttaaaaagtggtgctaaagga 
tacattttaaaaaattcacctgatgagcaattaatattggccgtacgtacagtatatcaa 
ggtgaaacttatgttgatatgaaattgacgacgtctttagtcaatgagtttgtcaatcaa 
tcacaaacggatgaagtgtcatcatcttcagatccatttaaaattttatcgaaacgagag 
ttagaaatattacctcttatagcaaaaggctatggcaataaagatattgcagaaaagttg 

45 tttgtatcggtgaaaacggtagaggcacataaaacgcatattatgacgaaactaaattta 
aagagtaaacctgaattagttgaatatgccttaaagaaaaaattattagaattttaa 

Sequence 108 

VKIVIADDHAVVRTGFSMILNYQEDMEVVATAADGVEAYQKVLEHRPDVLILDLSMPPGE 
50 SGLIATSKISESFPDTKILILTMFDDEEYLFHVLKSGAKGYILKNSPDEQLILAVRTVYQ 
GETYVDMKLTTSLVNEFVNQSQTDEVSSSSDPFKILSKRELETLPLIAKGYGNKDIAEKL 
FVSVKTVEAHKTHIMTKLNLKSKPELVEYALKKKLLEF* 

Sequence 109 
55 ContigJ)4 48_pos_2830_4107 / 

is similar to (with p-value 2.0e-74) 

>sp: sp | P13702 I MVAA_PSEMV 3- HYDROXY-3-METHYLGLUTARYL- COENZYME 
A REDUCTASE (EC 1.1.1.88) (HMG-COA REDUCTASE) . >pir:pir|A44 
756IA44756 hydroxymethylglutaryl-CoA reductase (EC 1.1.1.88) 
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- Pseudomonas sp. >gp: gp I M24 015 | PSEHMGCOA_l P.mevalonii HMG 
-CoA reductase (mvaA) gene, complete cds . NID: gl51258. 
atgaaaagtttagataaaggatttagacatttaacacgaaaagataaattaaaaaaactt 
gttgaatacggttggctagatgatgaaaactatgaaatattacttaatcatccgttaatt 
5 aatgaggaagtcgcaaacagtttaattgaaaatgtcattggtcaaggtgcactaccagta 
gggttat tacctcgaa ttatagttgatgataaagaatatgtagtacctatgatggtagag 
gaaccttctgtcgtagcagcagcaagttatggcgcaaaactcgttaatcaaagtggtgga 
tttaagacaatttcaagtgaacgtctaatgattggacaaattgtctttgatgatgttgaa 
gacacaggcacattagctaactcaatatatcaaatagaatcacaaattcatcaaatcgct 

10 gatgaagcttacccttctattaaagcaagaggtggaggatatcaacgtattgaaatagat 
acattccctaatcatcgattattatctttgaaggtttttgttgatactaaagatgctatg 
ggtgctaatatgttaaatacaatattagaagcaatcactgcacatctaaaagttaaattt 
tcaaatcaaaatgttttaatgagtattttatctaatcatgcgacagcatcagtagtaaaa 
gtacaaggggaaatagatattgaagatttacatagaggagagagaagtggcgaagaggta 

15 gcacaacgtatggaacgagcgtcagttcttgcacaagtagatatacatcgtgctgcaaca 
cataacaaaggtgtgatgaatggtatacacgctgtagtattggctacaggcaatgataca 
agaggagttgaagcaagtgctcatgcatatgcaagcaaagatgg teat tat agagggata 
gctacttgggaatatgatcgctcacgtaataaattggttggaactattgaagttcctatg 
actttagcgacagtaggtggaggtacgaaagttttacctattgctaaagcctcattaaat 

20 ttgcttaatgttgaaaatgcacaggaactagggcaagttgttgctgctgttggattagca 
caaaatttctctgcatgtagagcgctagtgtctgaggggatacaacaaggacatatgagt 
ttacaatataaatcattagcgattgttgtaggtgcaaaaggcgaagaaattgcgcaagta 
gctgaagcgctcaaatatgaatcacaagctaatactgccaaagctcaagaaatcttgatg 
aatataagaaagtcataa 

25 



Sequence 110 

MKSLDKGFRHLTRKDKLKKLVEYGWLDDENYEILLNHPLINEEVANSLIENVIGQGALPV 
30 GLLPRIIVDDKEYVVPMMVEEPSVVAAASYGAKLVNQSGGFKTISSERLMIGQIVFDDVE 
DTGTLANSI YQIESQIHQIADEAYPSIKARGGGYQRIEIDTFPNHRLLSLKVFVDTKDAM 
GANMLNTILEAITAHLKVKFSNQNVLMSILSNHATASVVKVQGEIDIEDLHRGERSGEEV 
AQRMERASVLAQVDIHRAATHNKGVMNGIHAVVLATGNDTRGVEASAHAYASKDGHYRGI 
ATWEYDRSRNKLVGTIEVPMTLATVGGGTKVLPIAKASLNLLNVENAQELGQVVAAVGLA 
35 QNFSACRALVSEGIQQGHMSLQYKSLAIVVGAKGEEIAQVAEALKYESQANTAKAQEILM 
NIRKS* 

Sequence 111 

Cont ig_0 4 4 8_pos_4 6 1 8_4 187, 

40 is similar to (with p-value 2.0e-20) 

>gp:gp|U96107 |SCU96107_3 Staphylococcus carnosus N5,N10-meth 
ylenetetrahydromethanopterin reductase homolog, SceB precurs 
or (sceB) and putative transmembrane protein genes, complete 
cds, and putative Na+/H+ antiporter NhaC (nhaC) gene, parti 

45 al cds. NID: g2735503. 

atgaaattcaaaaaattattatctcgtattattatcgctacaatgattacatttactgga 
acactctcatatcaagctattgaacaaacgcatatttcccatgctgcacataattattat 
ggtaaaaaacaatgcacttggtgggcatttaaacgtcgtgctcaattaggtaaacctgta 
tcaaatcgatggggtaatgctaagaattggtatagcaatgcacgtcgatctggttatgca 

50 actggacataagcctcgaaaatacgctgttatgcaatcaacgagaggctattatgggcac 
gtagcagtggt tgaaaaagtatataagaatggaaaaatcaaaatttctgaatataattat 
aatgtgccattaggctacggcacacgcattattagtaaatcgtctgcacgaaactataat 
tatatttattaa 

55 Sequence 112 

MKFKKLLSRIIIATMITFTGTLSYQAIEQTHISHAAHNYYGKKQCTWWAFKRRAQLGKPV 
SNRWGNAKNWYSNARRSGYATGHKPRKYAVMQSTRGYYGHVAVVEKVYKNGKIKISEYNY 
NVPLGYGTR1 ISKSSARNYNYI Y* 
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Sequence 113 

Contig_04 4 8_pos_2534_1422, 

is similar to (with p-value 2.0e-31) 

>sp:sp|P4 0830|PKSG_BACSU PUTATIVE POLYKETIDE BIOSYNTHESIS PR 
5 OTEIN PKSG. >gp : gp I U11039 I BSU1 1039_2 Bacillus subtilis W168 
polyketide synthase (pksX and pksorfx6) genes, complete cds . 

NID: g602656. >gp : gp | Z991 12 | BSUB0009_183 Bacillus subtilis 
complete genome (section 9 of 21): from 1598421 to 1807200. 
NID: g2633902. >gp : gp I Z9911 3 1 BSUB0010J7 Bacillus subtilis co 
10 mplete genome (section 10 of 21): from 1781201 to 2014980. N 
ID: g2634090. 

atggctaaacttgcagaagcgcgccaagtcgatcctaataaatttttaattggaattggt 
caaactgaaatgactgtgagcccagtgaatcaagatatcgtatctatgggagccaatgct 
gctaaagatattataacagaagaagataaaaagaatattggtatggttatagtagcaact 

15 gagtctgcgattgataatgccaaagcagcagccgttcaaattcaccatcttttaggtatt 
caaccctttgcaagatgctttgaaatgaaagaggcttgttatgcagcaacacctgcaatt 
caacttgccaaagattatcttgctcaacgccctaacgaaaaggttcttgtcattgctagt 
gacacagctcgttatggtattcattctggtggtgagcctactcaaggtgccggtgcagtt 
gcaatgatgatttcacataacccaagtattttaaaacttaatgatgatgccgtagcatat 

20 actgaagacgtttatgatttctggcgtccaacgggtcatcaatatcccttagttgctggt 
gcattgtcgaaagatgcctatatcaagtcattccaagaaagttggaatgaatatgcacgt 
cgccataataaaacactcgctgatttcgcttcactatgtttccatgtaccattcaccaaa 
atgggacaaaaagctttagattctattattaatcatgccgatgaaactacacaagaccgt 
cttaactctagttaccaagatgcagttgattataatcgttatgtcggtaatatttacaca 

25 gggtccttatatttaagtctcatctctttattagaaacacgtgatttaaaaggcggacaa 
acgattggtctctttagttatggttctggttctgtaggcgagttctttagtggaacatta 
gtagatggattcaaggagcaattagatgttgagcgccacaaatttttattaaataataga 
atagaggtttctgttgatgaatatgaacatttcttcaaacgctttgaccaattagaattg 
aatcatgaacttgaaaaatcaaatgcagat cgtgacattttctattt aaaatctattgat 

30 aacaatattcgtgaatatcatatagcagaataa 

Sequence 114 

MAKLAEARQVDPNKFLIGIGQTEMTVSPVNQDIVSMGANAAKDIITEEDKKNIGMVIVAT 
ESAIDNAKAAAVQIHHLLGIQPFARCFEMKEACYAATPAIQLAKDYLAQRPNEKVLVIAS 
35 DTARYGIHSGGEPTQGAGAVAMMISHNPSILKLNDDAVAYTEDVYDFWRPTGHQYPLVAG 
ALSKDAYIKSFQESWNEYARRHNKTLADFASLCFHVPFTKMGQKALDSIINHADETTQDR 
LNSSYQDAVDYNRYVGNIYTGSLYLSLISLLETRDLKGGQTIGLFSYGSGSVGEFFSGTL 
VDGFKEQLDVERHKFLLNNRIEVSVDEYEHFFKRFDQLELNHELEKSNADRDIFYLKSID 
NNIREYHIAE* 

40 

Sequence 115 

Contig_04 4 9_pos_584_919, 

is similar to (with p-value 3.0e-38) 

>sp:sp|P4 287 4 |URE2_STAXY UREASE BETA SUBUNIT (EC 3.5.1.5) (U 
45 REA AMI DOHYDROLASE ) . >pi r : pir I S38 4 8 4 IS38484 urease (EC 3.5.1 
.5) beta chain - Staphylococcus xylosus >gp : gp | X74 600 | SXUREA 
BC_2 S. xylosus gene for ureA, ureB, and ureC genes for ureas 
e gamma, beta and alpha subunits. NID: g410513. 
gtgattgaagtaaaaaatacaggcgatagacctatacaagtaggttcacatttccacttt 
50 ttcgaagcaaataaagcattagaatttgatcgtgagaaagcatatggtaaacatttggat 
attcctgcaggagctgcagtgagatttgaacctggagatgaaaaaaaagtacaacttgtc 
gaatattctggacgacgtaaaatttatggattccgtggtttagtcgatggcgatattgac 
gaagaacgcgtattccgtccaaatgattcaaatcaaaacgccgccgttaaaaacgatgca 
ggcgaagacaatgcgaataaaaaaggtggtaaataa 

55 

Sequence 116 

VIEVKNTGDRPIQVGSHFHFFEANKALEFDREKAYGKHLDI PAGAAVRFEPGDEKKVQLV 
EYSGRRKIYGFRGLVDGDIDEERVFRPNDSNQNAAVKNDAGEDNANKKGGK* 
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Sequence 117 

Contig_04 4 9_pos_922_2637 , 

is similar to (with p-value 0.0e+00) 

>sp:sp|P42873|UREl_STAXY UREASE ALPHA SUBUNIT (EC 3.5.1.5) ( 
5 UREA AMI DOHYDROLASE) . >pir :pir I S384 85 | S3848 5 urease (EC 3.5. 
1.5) 62K chain - Staphylococcus xylosus >gp : gp I X74 600 I SXUREA 
BC_3 S. xylosus gene for ureA, ureB, and ureC genes for ureas 
e gamma, beta and alpha subunits. NID: g410513. 
atgagttttaaaatgacacaatctcaatacacaagtctttatggaccaactgtaggagac 

10 tctgtgagattaggagatacgaacttgtttgcacaagttgaaaaagactatgcaaattat 
ggagatgaagctactttcggtggcggaaaatcaattcgtgatggtatggctcaaaatcct 
aatgtgacaagagatgataaaaatgtagccgatttagttttaactaacgcattaattatt 
. gattatgacaagattgttaaagcagatatcggaattaaaaatggttatatttttaagatc 
ggtaaagctggaaacccagatataatggataacgttgacatcatcattggtgcaacaact 

15 gatattattgctgctgaaggtaaaattgttactgccggcggtatcgatacacacgtgcac 
ttcatcaatcctgaacaagctgaagttgcacttgagagtggtattacaacgcatatcggt 
ggaggaactggtgcttctgaaggtgctaaagcgactactgtaacaccaggaccttggcat 
attcatcgcatgttagaagcagcagaagagatgcctattaatgtaggatttactggtaaa 
ggtcaagctgtcaatcatactgcacttattgaacaaattcatgcaggcgctataggtctt 

20 aaagtacatgaagattggggagctacaccttcagcattaagtcatgcattagacgttgca 
gatgagtttgatgttcaagtcgctttacatgcagatacattaaatgaagctggatttatg 
gaagatacaatggctgctgtgaaagatcgtgtattgcatatgtatcatactgaaggagct 
ggtggtggtcatgcacctgacttaatcaaatcagctgcatattcaaacatcttaccttct 
tctacaaacccaacattaccttacactcacaacactgtagatgaacatttagacatggtt 

25 atgattactcaccatcttaatgcttcaataccagaagacattgcatttgcagattctcgt 
atacgtaaggaaactatagcagcagaagacgtattacaagatatgggcgtatt tagtatg 
gtaagttcagattcacaagcaatgggacgtgtcggtgaagttgtaacacgtacttggcaa 
gttgcacaccgtatgaaagaacaacgcggaccaL tagatggtgactt tgaatatcacga t 
aataatcgtattaaacgttacattgcaaaatatacaatcaatcctgccattacacatggt 

30 atttctgactatgttggatctgtagaagcgggtaaacttgccgatttagtaatgtgggaa 
ccagaattcttcggtgccaaacccgatcttgttgttaaaggtggcatgattaactcagca 
gtaaatggtgatgctaatggctccataccaacatcagagcctttgaaatatcgcaaaatg 
tatggtcaatttggtggtaacattacacatactgctatgacttttgtttctaacactgca 
tatgaaaacggtatttatcgtcaactcaatctaaaacgaatggttcgaccagttagaaat 

35 attagaaatttaactaaggcagatatgaaaaataataatgctacacctaaaatagatgta 
gatccacaaacatatgaggtattcgttgatggtaataaaatcacaagtgaagcagcaaca 
gaattaccattaacacaaagatacttcttattctag 

Sequence 118 

40 MSFKMTQSQYTSLYGPTVGDSVRLGDTNLFAQVEKDYANYGDEATFGGGKSIRDGMAQNP 
NVTRDDKNVADLVLTNALI I DYDKI VKADIGIKNGYI FKIGKAGNPDIMDNVDI I IGATT 
DIIAAEGKIVTAGGIDTHVHFINPEQAEVALESGITTHIGGGTGASEGAKATTVTPGPWH 
IHRMLEAAEEMPINVGFTGKGQAVNHTALIEQIHAGAIGLKVHEDWGATPSALSHALDVA 
DEFDVQVALHADTLNEAGFMEDTMAAVKDRVLHMYHTEGAGGGHAPDLIKSAAYSNILPS 

45 STNPTLPYTHNTVDEHLDMVMITHHLNASIPEDIAFADSRIRKETIAAEDVLQDMGVFSM 
VSSDSQAMGRVGEVVTRTWQVAHRMKEQRGPLDGDFEYHDNNRIKRYIAKYTINPAITHG 
ISDYVGSVEAGKLADLVMWEPEFFGAKPDLWKGGMINSAVNGDANGSIPTSEPLKYRKM 
YGQFGGNITHTAMTFVSNTAYENGIYRQLNLKRMVRPVRNIRNLTKADMKNNNATPKIDV 
DPQTYEVFVDGNKITSEAATELPLTQRYFLF* 

50 

Sequence 119 

Con t ig_0 4 4 9_pos__2 6 5 1_3 1 0 3 , 

is similar to (with p-value 2.0e-48) 

>sp: sp | Q074 01 I UREE_BACSB UREASE ACCESSORY PROTEIN UREE. >pir 
55 :pir|D36950|D36950 ureE protein - Bacillus sp. (strain TB-90 
) >gp:gplD14 4 39|BACUREA_4 Thermophilic Bacillus genes for ur 
ease subunits and urease accessory proteins, complete cds . N 
ID: g393296. 

atgattatagaagaaattcaaggaaatattgctaatttatctcaagatgaaaagcaaaaa 
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catgtcgaaaaagtttatcttgaaaactcagatttggttaaacgtatacaacgtgttaaa 
acagatcacggtaatgaaatagggatacgtcttaaacaacctattgacctacaatatggt 
gatattttatatcaagacgatacaaacatgattattgtcgatgttaatagcgaagactta 
ttagttattaaacctagaaatttaaaggaaatgggagacattgctcatcaactaggtaat 
5 cgccatctgcctgcccaatttacagaaactgaaatgcttattcaatatgactatcttgtt 
gaagatttattaaaagagttgggtatcccctactcacatgaagacagaaaggtcaatcaa 
gcatttcgacatataggacattcacatgattga 

Sequence 120 

10 MI IEEIQGN1ANLSQDEKQKHVEKVYLENSDLVKRIQRVKTDHGNEIGIRLKQPIDLQYG 
DILYQDDTNMIIVDVNSEDLLVIKPRNLKEMGDIAHQLGNRHLPAQFTETEMLIQYDYLV 
EDLLKELGIPYSHEDRKVNQAFRHIGHSHD+ 

Sequence 121 
15 Contig_04 4 9_pos_14 67_1159, 

is similar to (with p-value 4.0e-37) 

>sp:sp|P42873|UREl_STAXY UREASE ALPHA SUBUNIT (EC 3.5.1.5) ( 
UREA AMI DO HYDROLASE) . >pir : pir ! S38485 | S384 85 urease (EC 3.5. 
1.5) 62K chain - Staphylococcus xylosus >gp : gpj X7 4 600 | SXUREA 

20 BC_3 S. xylosus gene for ureA, ureB, and ureC genes for ureas 
e gamma, beta and alpha subunits. NID: g410513. 
atgccaaggtcctggtgttacagtagtcgctttagcaccttcagaagcaccagttcctcc 
accgatatgcgttgtaataccactctcaagtgcaacttcagct tgttcaggattgatgaa 
gtgcacgtgtgtatcgataccgccggcagtaacaattttaccttcagcagcaataatatc 

25 agttgttgcaccaatgatgatgtcaacgttatccattatatctgggtttccagctttacc 
gatcttaaaaatataaccatttttaattccgatatctgctttaacaatcttgtcataatc 
aataa ttaa 

Sequence 122 

30 MPRSWCYSSRFSTFRSTSSSTDMRCNTTLKCNFSLFRIDEVHVCIDTAGSNNFTFSSNNI 
SCCTNDDVNVIHYIWVSSFTDLKNITIFNSDICFNNLVIINN* 

Sequence 123 

Cont ig_04 50_pos_68 60_7 4 8 6, 

35 is similar to (with p-value 8.0e-57) 

>nrl3d : pir | | 1GPHA Glutamine phosphoribosylpyrophosphate (prp 
p) Amidotransf erase (EC 2.4.2.14), chain A - Bacillus subtil 
is >nrl3d:pir | | 1GPHB Glutamine phosphoribosylpyrophosphate ( 
prpp) Amidotransf erase (EC 2.4.2.14), chain B - Bacillus sub 

40 tills >nrl3d:pir I 1 1GPHC Glutamine phosphoribosylpyrophosphat 
e (prpp) Amidotransf erase (EC 2.4.2.14), chain C - Bacillus 
subtilis >nrl3d:pir I I 1GPHD Glutamine phosphoribosylpyrophosp 
hate (prpp) Amidotransf erase (EC 2.4.2.14), chain D - Bacill 
us subtilis 

45 atggtaataggcgtacctaattcatcattatctgcagcaagtggttatgctgaagaaata 
ggcctaccatatgaaatgggactagttaaaaatcaatatgttgctcgaacttttatacaa 
cctactcaggaattaagagagcaaggtgtacgtgtgaaactgtcggctgttaaggatatt 
gttgatggtaaagatatcgtacttgtagatgattcgattgttcgaggtacaacgattaaa 
cgcatagttaaaatgcttaaggattcaggagctaaccgcattcacgtaagaattgcttct 

50 cccgaattcatgttccctagtttttatggtattgacgtatctacaacagctgaactcatc 
tcagcaagtaagtctcctgaggaaattaaaaatcatattggtgcagattctcttgcttat 
ttaagcgttgatggcttaatcgagtctataggacttgattatgatgcgccatatcatggc 
ttgtgtgtagaaagttttacaggtgattatccagcaggactttacgattatgagaaaaat 
tataaaaagcatttaagtgaacgtcaaaaatcatatatagctaataataaacattatttt 

55 gatagtgagggaaatttacatgtctaa 

Sequence 124 

MVIGVPNSSLSAASGYAEEIGLPYEMGLVKNQYVARTFIQPTQELREQGVRVKLSAVKDI 
VDGKDIVLVDDSIVRGTTIKRIVKMLKDSGANRIHVRIASPEFMFPSFYGIDVSTTAELI 
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SASKSPEEIKNHIGADSLAYLSVDGLIESIGLDYDAPYHGLCVESFTGDYPAGLYDYEKN 
YKKHLSERQKSYIANNKHYFDSEGNLHV* 

Sequence 125 
5 Contig_04 50_pos_14 672_1544 5, 

putative peptide of unknown function 

atggaccatagttccgcttcgaaaaaattaattaaagatatagagcaaaatcagtatgta 
acagtaaaacatttatctcatgatgatttttatattgatgatttggtaaaaaagaaagaa 
gtcattgcaagccttgaaatacctaaggatttctcaaaacaccttaaagataatgattta 

10 aataagactcttccattatatagcagagatgattttataggacatattgctatggaaata 
atcagtcgatcattatacgaacaacaaatccctaatattattcatgaacatcttgatgat 
atgaaacaaccacaatccttagataaagtgaaacaatcttattattcgcttacacctcaa 
tctaaaataaaaagtgtagctatcaataaacatgctcatcaatccatttcaattggcatt 
gtatttgttgtcgtcatctttgtaagtgttatccaaatcctattacatcaacgtcttaaa 

15 cagaacgcacctctcgaaagattatatttggtaccttatagtcaacttaaactatacttg 
acttatatcagtgtacatgtggtcatactgatgctcatgctattgatgattagcctttta 
atgcatcaaccattaagcattcttttctacttaaaaacactgattatagttttattttat 
gaggcaggtattgctttattactttttaaaattaatgttcttagtcaccgtatattcatg 
gctattatttatacggtagcgataggtattatatatttatggattcaattgtaa 

20 

Sequence 126 

MDHSSASKKLIKDIEQNQYVTVKHLSHDDFYIDDLVKKKEVIASLEIPKDFSKHLKDNDL 
NKTLPLYSRDDFIGHIAMEI ISRSLYEQQIPNIIHEHLDDMKQPQSLDKVKQSYYSLTPQ 
SKIKSVAINKHAHQSISIGIVFVVVIFVSVIQILLHQRLKQNAPLERLYLVPYSQLKLYL 
25 TYISVHVVILMLMLLMISLLMHQPLSILFYLKTLIIVLFYEAGIALLLFKINVLSHRIFM 
AIIYTVAIGI I YLWIQL* 

Sequence 127 

Contig__04 50_pos_17 961_18629, 
30 is similar to (with p-value 3.0e-41) 

>sp;sp|P40815| T3RESALTY TYPE III RESTRICTION-MODIFICATION S 
YSTEM STYLTI ENZYME RES (EC 3.1.21.5). >pir :pir IJN0658 IJN065 
8 restriction endonuclease (EC 3.1.-.-) - Salmonella typhimu 
rium 

35 atgcttaacatcatgacaacaaaaataatgattccaacgatatcaaattttttattatat 
gcagactcatctttagattctggtatacctctaagcaatatgagtgaattaagtttaaat 
aatataataagagaatttaacaagcgttttgaagaaaaatatagtcaaagttatgaatat 
aaaaaattagatttttctgctactacaacca tttatgattcagaaatatcagagtt taaa 
gattgggtagatgcaaattatttaggtactaacgttgaaaataacattcaaactgaaaaa 

40 agatttttatatgaaagaccaccagttagatatgatagtgtaacacctgagttagagttg 
ttaaaaagaaattacgataaaaatgtaactgtatttggtaatttgcctaaaaaagcgata 
caagttcctaaatatactggtggcactactacgcctgattttgtctatatgatagaaact 
gatgaacaagatgcaaaataccttattgttgaaacaaaagcagaaaacatgagactagga 
gataaaagtattggtgaaatacaaaaaaaattctttaacacattagataatttgaatatt 

45 aaatatcaattagctactagcgcgcaagatgtttataatgaaattaaaaaattagatgat 
tcaaagtga 

Sequence 128 

MLNIMTTKIMIPTISNFLLYADSSLDSGIPLSNMSELSLNNIIREFNKRFEEKYSQSYEY 
50 KKLDFSATTTIYDSEISEFKDWVDANYLGTNVENNIQTEKRFLYERPPVRYDSVTPELEL 
LKRNYDKNVTVFGNLPKKAIQVPKYTGGTTTPDFVYMIETDEQDAKYLIVETKAENMRLG 
DKSIGEIQKKFFNTLDNLNIKYQLATSAQDVYNEIKKLDDSK* 

Sequence 129 
55 Contig_0450_pos_18636_19928 f 

putative peptide of unknown function 

atgatggggaaatcagaaaaaatttcattacttgaaaaagtccaagatggtttagtagat 
aaaaccggagcaaaacctaagttaccaacaactattaccgatttaaatcaagaaacttta 
gaggtatatagaataccactaaagtttctgtattataatgatagaaatggaagaattgct 
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tctgtaatatccagagtaagcgacgatataaaagttgcttatgaatttgaagataataat 
tataataaaagtattgaaaatatgatatatgaggctaatacttctgctttaaaaaacact 
aaaaaatctattaaagataaaggtcagcaagtatttggttatgtattagatgatggtaga 
gttattgacggaaatagaaggtttactgcgcttagacagt tagaacaagaaacaggaaat 
5 actttttattttgaagctgttattttaccatttacttatgataaaaagactgatcgagct 
aaaatcaagcaattagaacttgcaatacaaatgggtatagaaggaaagcaagattatgat 
aaagttgatgaagcggtggatatttatcaaacaattgaagttgaaaagttgatgactgta 
gcggactatgcaaatgagtcaaataaaacaaaaaaaactattgaaaagcaattaggttct 
gcaaaattaataagaaaatttttagattttattaatgctcaagagaattcttattatatt 

10 ataaaagatgctggcatttattcattatttgaagaagcagtacctaagttagataaagct 
tatcctaaaggtggaccttcgttagaagatgctattgaaaaattttttagttttgtcctt 
ttgcaaattcaatcagggacaagcacacgggcatatgctggaagagattattttgaaaat 
atcgttttttcaaatgagggtagtcatcaatttaatacagaaacagaagatgctatcgat 
ggtttgagagataaattagaagaaaaacgtgtagaatctaccgcagatttaaagagtaca 

15 ctaagccaatcgatacctgaactacgagaagtaagttcttcatataacaaagttgtaagc 
aaaagtaaacgaaatgctaatgtggaaagttttattgaaaatgttaaatcaatgtctgaa 
agtctcaatgatatggaaaaaggcaatggtttacctagtagtcttaattttgaacaattt 
aatctaaaacaattaaaagaaattagagaaatgctaataagaataaataattgtagtaga 
gagttgatagatatttatgaacatgaaatctga 

20 

Sequence 130 

MMGKSEKISLLEKVQDGLVDKTGAKPKLPTTITDLNQETLEVYRIPLKFLYYNDRNGRIA 
SVISRVSDDIKVAYEFEDNNYNKSIENMI YEANTSALKNTKKSIKDKGQQV FGYVLDDGR 
VI DGNRRFTALRQLEQETGNTFY FEAVI LPFTY DKKTDRAKI KQLELAIQMGI EGKQDYD 
25 KVDEAVDI YQTIEVEKLMTVADYANESNKTKKTIEKQLGSAKLIRKFLDFINAQENSYYI 
IKDAGI YSLFEEAVPKLDKAYPKGGPSLEDAIEKFFSFVLLQIQSGTSTRAYAGRDYFEN 
IVFSNEGSHQFNTETEDAI DGLRDKLEEKRVESTADLKSTLSQSI PELREVSSSYNKVVS 
KSKRNANVESFIENVKSMSESLNDMEKGNGLPSSLNFEQFNLKQLKEIREMLIRINNCSR 
ELIDIYEHEI* 

30 

Sequence 131 

Contig_04 50_pos_227 61_22258, 
putative peptide of unknown function 

atgaatacaatcaaaagtacgatacacacagaagcgatttttagcgatgatgaacaacac 
35 cgat act tact taaaaagacgtggaatgaaaagaagcccacatgtacagtgataacgatg 
tcccctcatttagacggcatattatcactcgatcttacaactgttcttatcctcaatcaa 
ttagcgaattcagaacgatacggtgctgtatatttagtgaatttattttcgaatattaaa 
accccagataatctcaaacatattaaagagccttatgataaacatacagacagacactta 
atgaaagcaataagtgagagtgacacagtaattctagcttatggagcctatgcgaagcga 
40 ccatttgttatcgaacgtgttgaacaagtgatggaaatgttgaagcctcacaaaaagaaa 
attaaaaagctcataaacccagcaacaaatgaaatcatgcacccactcaatcctaaagca 
cgccaaaaatggacattgaaataa 

Sequence 132 

45 MNTIKSTIHTEAIFSDDEQHRYLLKKTWNEKKPTCTVITMSPHLDGILSLDLTTVLILNQ 
LANSERYGAVYLVNLFSNIKTPDNLKHIKEPYDKHTDRHLMKAISESDTVILAYGAYAKR 
PFVIERVEQVMEMLKPHKKKIKKLINPATNEIMHPLNPKARQKWTLK* 

Sequence 133 
50 Contig_04 50_pos_17 953_17219, 

putative peptide of unknown function 

gtgattacgcaaggagatagaattggatggttaaatcctcttatattgatattaat tgct 
atattcattgtgacattaattgcattttatatatttgaaaaacgtcaagatgaacctttt 
atagatttaagtttattttcaaataatgtttatattggaacaacattagccaacttgatg 
55 gtgaacatggatattggttcattagcattatttaatatttatgttcaagacgataaacat 
ctatcagctgcacaagccggtttaattacaattccatatatgctgtgtagtttgttaatg 
attcgtgttggtgaacgttttatgcaaaaaagaggaccgcaattgccattgatgttaggt 
ccggtatcaattactgttggtattatacttttagcattcacttctttgcctaatatgatt 
tattatattgtggcatgtattggctttatctttataggtctaggattaggattttttgct 
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acacccgcgctatctacagctgtatctaatgttccagctgaaaaagcaggtactgcatca 
ggaattatcaaaatgacttctacactaggtgcagcatttggaatcgctgttgtgacaaca 
atatatacggcattatctgtaaatcacccggcatatttagcagctactatcgcatttatc 
gtgggtgcaggtttagtgtttatcgcatttattgcggcgtattgtttaattcctaaaaag 
5 aatgtagatatctaa 

Sequence 134 

VITQGDRIGWLNPLILILIAIFIVTLIAFYIFEKRQDEPFIDLSLFSNNVYIGTTLANLM 
VNMDIGSLALFNIYVQDDKHLSAAQAGLITIPYMLCSLLMIRVGERFMQKRGPQLPLMLG 
10 PVSITVGIILLAFTSLPNMIYYIVACIGFIFIGLGLGFFAT PALSTAVSNVPAEKAGTAS 
GI IKMTSTLGAAFGIAVVTTIYTALSVNHPAYLAATIAFI VGAGLVFIAFIAAYCLIPKK 
NVDI* 



15 

Sequence 135 

Contig_04 50_pos_11730_11380, 
putative peptide of unknown function 

atgattggtatgtcgtttaatcaactaggtgcttttaaagaagctttaccatttttaatg 
20 actgcagctgaaatggacgatgatagagatttagaagtacagtttcaatatgggttagta 
ctatgccaactcgaaatgtttgatgaagctattaaacaattaaataaggttctttctatc 
gattcacagcacgtagatggtatatataatcttggtttagcaacatatatgaaaaatgaa 
aatttagatgaagcaattgcatattttgaacaagcaatatcaa ttgatgaaaaacattta 
cttagtcaacatgcattaaagacattcaaaacaatgaaagaggaggaataa 

25 

Sequence 136 

M I GMS FNQLGAFKEALP FLMTAAEMDDDRDLEVQFQYGL VLCQLEMFDEA I KQLNKVLS I 
DSQHVDGI YNLGLATYMKNENLDEAIAYFEQAISIDEKHLLSQHALKTFKTMKEEE* 

30 Sequence 137 

Contig_04 50_pos_11378_9129, 
putative peptide of unknown function 

atgtctgaccctacactttttgattattcaatgatcaaaggtacagttgatgctatttta 
tttcaaaatacggataatttttatactgttctaaaagtagatactatagaatcaaatgaa 

35 aaatttgatagtatgccaactgtggtagggtttcttcccaatgtagttgaaggcgatgtt 
tatacttttaaagggcaagtcgtacaacatccacgttatggtaagcaattaaaggctgaa 
acatttgaaaaagaattacctcaaactaaagaagccattattagttacttatcaagtgat 
ttatttaaaggcatcggtaaaaaaacggctcaaaacattgtaaatacactaggtgaaaat 
gctataaatgatattttaactcgtccagaaatcttagaaagtgtacctagtttaccaaag 

40 aagaaacaaaagcaaattgctgatcagattaatgcaaaccaagaatctgagaaaattatg 
atacgtttacacgacctagggtttggtccgaaattatcaatggctatatatcagttctat 
atgggtgatactttaaatgtcttagataaaaatccttaccaattagtatatgacattaaa 
ggtattggttttaataaggctgaccaacttgctcgaaatgtcggtattgagccacattca 
cctgaaagattaaaagcagcattat tatttacgttagaagaagaatgtatcaaacaagga 

45 catacatatctacctcgtacaattgttatagaaacaacacaaaatttactcaatgaagat 
attgagaaaccaattgaaacagagcaattactagaaatcattgacgttttatcagaagag 
aaaaaattaatatctgaagctgatcaggtatcaattccaagtttatactattcagaattg 
aaaagtgtgcaaaacttataccgaattaaaacaaacacatctaaattaaaagaaatagaa 
cagtctgatttacaaatacatattggtgatattgagtcacaaaatgaggttaattactct 

50 gcctctcaaaaagaagcgcttgaaacagcaataaattctaaaattatgcttttaactggt 
ggtccgggtaccggtaaaaccacagtcattaaaggtatagttgaattatatgcagaaata 
catgggctctcgctcgattatgatgattacaatgaagatgattatccagtagtgttagct 
gcacccactggtcgtgcttctaagcgccttcacgaatcgacaggtttagaagcaatgaca 
attcatcgtttaatcggttggaaccaagatacacaaccacaggatattttagaaaatgag 

55 atcaatgcaagactcattatcatcgatgaaatgtcaatggtagatacttggttgttccat 
caatttttaagcgctgtgcctttagaagcacaaattgtatttgtcggagatgaagatcag 
ttaccatcagtaggtccaggacaggtatttaaagaccttattgattctgaaataataccg 
cgtgttaatcttaccgaagtatatcgtcagcaagatggttccagtattattgacttagct 
caccgtatgaaattaaatgaacctatcgatattactaaacgttatcatgatcgtagtttt 
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attcgttgtggtacgaatcaaattccagacgttgttgataaagtagttaaaagcgctgta 
gctaaaggctatgatatgagtgatatacaagttttggctcctatgtataaaggtaacgct 
ggtattaagagacttaaccaagttctacaatctattcttaatccgaagcaacaagatgat 
cgtgaaatagaatttggtgaagctgtgtttagaaaaggggataaagtacttcagttagtt 
5- aatcgacctaatgataatatatttaatggggatataggtataatagtaggtatattttgg 
gccaaagaaaatgctctaaataaggatgtgttagttgtagattttgaaggtaatgaaatt 
acatttactaaacaagatttaatggaactaacacatgcatattgtacatctatccataaa 
tcacaaggttcagaatttcctattgtaattatgcctattgttagacaatattataggatg 
ttacaacgtcccattctttatacaggattaactagagctaaacaatcacttgttttatcg 
10 ttaaagagagatatacatttttatttatttttaagatttctcagaaaaattaggtttttc 
tcatttaattttaatcttagccatttataa 



15 Sequence 138 

MSDPTLFDYSMIKGTVDAILFQNTDNFYTVLKVDTIESNEKFDSMPTVVGFLPNVVEGDV 
YTFKGQVVQHPRYGKQLKAETFEKELPQTKEAIISYLSSDLFKGIGKKTAQNIVNTLGEN 
AINDILTRPEILESVPSLPKKKQKQIADQINANQESEKIMIRLHDLGFGPKLSMAIYQFY 
MGDTLNVLDKNPYQLVYDIKGIGFNKADQLARNVGIEPHSPERLKAALLFTLEEECIKQG 

20 HTYLPRTIVIETTQNLLNEDIEKPIETEQLLEIIDVLSEEKKLISEADQVSIPSLYYSEL 
KSVQNLYRIKTNTSKLKEIEQSDLQIHIGDIESQNEVNYSASQKEALETAINSKIMLLTG 
GPGTGKTTVIKGIVELYAEIHGLSLDYDDYNEDDYPVVLAAPTGRASKRLHESTGLEAMT 
IHRLIGWNQDTQPQDILENEINARLIIIDEMSMVDTWLFHQFLSAVPLEAQIVFVGDEDQ 
LPSVGPGQVFKDLIDSEII PRVNLTEVYRQQDGSSI IDLAHRMKLNEPI DITKRYHDRSF 

25 I RCGTNQI P D VVDKVVKSAV AKG Y DMS D I QVLAPM YKGN AG I KRLNQVLQS I LN PKQQDD 
REI EFGEAVFRKGDKVLQLVNRPNDNI FNGDIGI I VGI FWAKENALNKDVLVVDFEGNEI 
TFTKQDLMELTHAYCTSIHKSQGSEFPIVIMPIVRQYYRMLQRPILYTGLTRAKQSLVLS 
LKRDIHFYLFLRFLRKIRFFSFNFNLSHL* 

30 Sequence 139 

Contig_04 50_pos_5605_5021, 

is similar to (with p-value 8.0e-52) 

>sp:sp|P54 378|GCST_BACSU PROBABLE AMINOMETHYLTRANSFERASE (EC 
2.1.2.10) (GLYCINE CLEAVAGE SYSTEM T PROTEIN). >gp:gp|D8443 
35 2|BACJH642_194 Bacillus subtilis DNA, 283 Kb region containi 
ng skin element. NID: g2627063. >gp : gp | 2991 16 | BSUB0013_1 68 B 
acillus subtilis complete genome (section 13 of 21) : from 23 
95261 to 2613730. NID: g2634723. 

atggcaatgtttgaattcaaacagaacgtacaaatctttggtaaatctattattctttcg 
40 cagtctggttatactggagaagatggctttgaaatttactgtaagcaagaagatactaag 
gatatatgggagcaattattagaatacgatgttacaccatgcggtttaggtgctcgtgat 
acgctaagacttgaagcaggattacctttacatggtcaagatttatctgaatcaattact 
ccttatgaaggagggatagccttcgctgctaaaccgttaattgaaaatcattttattggc 
aaatccgtactcaaagctcaaaaagaaaatggttccgagcgtagaacagtaggtcttgaa 
45 ctattaggtaaaggcattgctagaacaggttatgacgtactagatgaaaatagtaatgaa 
attggtttcgttacatcaggaacacaatccccatcttctggtaaatctatagcacttgca 
ataatagatagagatgcatttgaaatgggcaaaaaagtaattgtgcaaatacgtaagcgt 
caagttgaggcaaaaatagttaaaaaaaatcaaattgagaaataa . 

50 Sequence 140 

MAMFEFKQNVQIFGKSIILSQSGYTGEDGFEIYCKQEDTKDIWEQLLEYDVTPCGLGARD 
TLRLEAGLPLHGQDLSESITPYEGGIAFAAKPLIENHFIGKSVLKAQKENGSERRTVGLE 
LLGKGIARTGYDVLDENSNEIGFVTSGTQSPSSGKSIALAIIDRDAFEMGKKVIVQIRKR 
QVEAKIVKKNQIEK* 

55 

Sequence 141 

Con t i g_0 4 5 0_po s_4 6 2 0_3 655, 

is similar to (with p-value 3.0e-97) 

>sp:sp| P54 376|GCS1_BACSU PROBABLE GLYCINE DEHYDROGENASE (DEC 
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ARBOX YLAT I NG ) SUBUNIT 1 (EC 1.4.4.2) (GLYCINE DECARBOXYLASE) 
(GLYCINE CLEAVAGE SYSTEM P- PROTEIN). >gp : gp | D84 4 32 | BAC JH64 
2_195 Bacillus subtilis DNA, 283 Kb region containing skin e 
lement. NID: g2627063. >gp: gp I Z99116 | BSUB0013_167 Bacillus s 
5 ubtilis complete genome (section 13 of 21): from 2395261 to 
2613730. NID: g2634723. 

atggatgtagcaaattcttctatgtatgatggtatgactagttttgctgaagcatgtata 
ttggcactaagtcatacgaaaaaaaataaaattgtagtttcaagtggactacattatcaa 
gctttacaaattctacacacatacgccaaaactcgtgatgaatttgaaataattgaagtt 

10 gatcttaaaggtactattactgatttagagaaattagaacaacttatcgatgacaacaca 
gcagctgtcgctgtccaatatcccaatttttatggttctattgaagatttagaacaaatt 
aataactatataaaggataaaaaagctttatttatcgtatatgccaatccactttct tta 
ggattactaacacccccaggtacattcggggcagacatagtagtgggagatacacagcct 
tttggtattcctacacaatttgggggtccgcattgtggatactttgctacaacaaagaaa 

15 ttaatgagaaaagtacctggtcgattagttgggcaaactcaagatgacgaaggtaatcgt 
ggatttgttctcacgttacaagctagagaacaacatatccgccgtgataaagcaacttct 
aatatt tgttcaaatcaagctttaaatgcacttgcatcttcaatagcaatgtcagcttta 
ggtaaacaaggtatttatgaaattgcagttcaaaatcttaaaaatgccaattatgccaaa 
aataagtttgaagaacatggttttgaggtactaaaagcacaatcttttaatgaatttgta 

20 gtcaaatttaatcaaccaataaaaaatattaatcttaaattagcagaatatggatatatt 
ggtggttttgacttaggtgaagtatctgatgattttaaaaaccatatgttagtagcagtt 
acagagttaagatctaaagatgaaatcgatgatttcgttacgaaagcaggtgagttaaat 
gattag 

25 Sequence 142 

MDVANSSMYDGMTSFAEACILALSHTKKNKIVVSSGLHYQALQILHTYAKTRDEFEI IEV 
DLKGT I TDLEKLEQLIDDNTAAVAVQYPNFYGS I EDLEQINNYIKDKKALFIVYANPLSL 
GLLTPPGTFGADIVVGDTQPFGIPTQFGGPHCGYFATTKKLMRKVPGRLVGQTQDDEGNR 
GFVLTLQAREQHIRRDKATSNICSNQALNALASSIAMSALGKQGI YEIAVQNLKNANYAK 

30 NKFEEHGFEVLKAQSTOEFVVKFNQPIKNINLKLAEYGYIGGFDLGEVSDDFKNHMLVAV 
TELRSKDEIDDFVTKAGELND* 

Sequence 143 

Contig_04 50_pos_34 1 9_2154 , 
35 is similar to (with p-value 0.0e+00} 

>sp: sp| P54 377 |GCS2_BACSU PROBABLE GLYCINE DEHYDROGENASE (DEC 

ARBOX YLAT I NG) SUBUNIT 2 (EC 1.4.4.2) (GLYCINE DECARBOXYLASE) 
(GLYCINE CLEAVAGE SYSTEM P- PROTEIN). >gp : gp | D84 4 32 I BAC JH64 

2_196 Bacillus subtilis DNA, 283 Kb region containing skin e 
40 lement. NID: g2627063. >gp : gp I Z99116 I BSUB0013_166 Bacillus s 

ubtilis complete genome (section 13 of 21): from 2395261 to 

2613730. NID: g2634723. 

atgaaatataatcctaaaatcaatgaaaaggtagcgcgtatttctggttttagtgaatct 
catcctttacaagaagaagaacacgttcaaggttctcttgaaattatatatagtttacaa 

45 gaagaattgaaggaaattactggtatggatgaagttaccctacaacctgctgcaggtg.ca 
catggtgagtggactgctttaatgattttcaaagcttatcatgaaaaaaatggacaaagc 
catcgtgatgaagtaatagtgcctgattcagcacatggtactaatcctgcttctgcctca 
tttgctggatttaaatcagtaactgtaaaatctaatcaacgtggggaagttgacatagaa 
gatttaaaaagagtagtaaacgataatacagctgcaatcatgttaactaatccaaataca 

50 ttaggtatat ttgaacaggatattat tgaaatagggaaaatcgttcatgaagcaggaggt 
ttattatattacgatggagcaaatttaaatgctattttagataaagtacgtcctggtgat 
atgggctttgatgcggtacatcttaatttgcacaaaacattcactggtcctcatggcggt 
ggtggaccaggatcaggaccagttggagtagtagagaaattagccagttatctacctaag 
cctatggttataaaagataacgataggtataaatatgataatgatattccaaattcaatt 

55 ggacgagtaaaaccgttttatggaaatttcggcatttatttaagagcatatacttatatc 
agatcaatgggagccaatggtttaaaagaagtatctgaagctgccgttcttaatgcgaat 
tatataaaatctcgccttaaaaatcactttgaaattccgttcaatcaatattgtaaacat 
gaatttgtattaagtggaactttacaaaaacaatatggtgtcagaacattagatatggct 
aagcgactgttagattttggtgtgcatccacctacaatatattttcctctcaatgtcgaa 
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gaaggaatgatgattgagccaacagaaactgaatctaaagaaacacttgattactttatt 
gatgcgatgattcaaatcgctgacgaaacaaaaaatgatccagataaagttttagaagca 
ccacatacgactataattgatcgattagatgagaccactgcagcacgaaaaccaattctt 
aaatttgaagaacttaaggacgaaaagtataaagaacacacaaatattgattctgaagat 
5 aattaa 

Sequence 144 

MKYNPKINEKVARISGFSESHPLQEEEHVQGSLEIIYSLQEELKEITGMDEVTLQPAAGA 
HGEWTALMIFKAYHEKNGQSHRDEVIVPDSAHGTNPASASFAGFKSVTVKSNQRGEVDIE 

10 DLKRVVNDNTAAIMLTNPNTLGIFEQDI IEIGKI VHEAGGLLYYDGANLNAILDKVRPGD 
MGFDAVHLNLHKTFTGPHGGGGPGSGPVGVVEKLASYLPKPMVIKDNDRYKYDNDI PNSI 
GRVKPFYGNFGIYLRAYTYIRSMGANGLKEVSEAAVLNANYIKSRLKNHFEI PFNQYCKH 
EFVLSGTLQKQYGVRTLDMAKRLLDFGVHPPTI YFPLNVEEGMMIEPTETESKETLDYFI 
DAMIQI ADETKNDPDKVLEAPHTTI I DRLDETTAARKPILKFEELKDEKYKEHTNI DSED 

15 N* 

Sequence 145 
Con t i g_0 4 5 0_po s_0_7 22, 
is similar to (with p-value 2.0e-53) 
20 >sp:sp|P54511|YQHM_BACSU HYPOTHETICAL 22.8 KD PROTEIN IN GCV 
T-SPOIIIAA INTERGENIC REGION . >gp : gp | D8 4 4 32 | BACJH64 2_198 Bac 
illus subtilis DNA, 283 Kb region containing skin element. N 
ID: g2627063. 

atgattatgactgaaatatggaattttataaatactggaagcaaaaatccttattataat 
25 atggcaatggacgaagcgttactaaattttgtatcgcgtggagaaatcgatccagttata 
agattttatacttggaatcctgcaacactctcaataggctactttcagcgtctccaaaaa 
gaaattgatattgataaagtaaaagaaaagggctatggcttagtaagacgtcaaacgggt 
ggtagaggcgtgttacacgataaagaattaacatatagcgt tattgttcctgagtctcat 
ccaaatatgccttcaactgtaactgaagcttataaaattatttcacaaggattattagaa 
30 ggttttaaaaatttaggttttgaaacttatttcgctatcccccgttctaaagaagaacga 
gacaaattaaagcaaccacgaagttcagtatgttttgatgcacctagttggtatgagctt 
gtagtagaaggcagaaaaattgcaggtagcgctcaaaccagacaaaaaggtgtcattctt 
caacatggttcaattttacaagatatagatatcgatgatttatttgatatgtttaaattt 
aaaaatgaacgactaaaagcaaaaatgaaagaaaattttgttcaaaaagctgtagctatt 
35 aatgacatttcaaatcaacatattacattaaatgaaatggagaacgcctttgaggcaggt 
tt 

Sequence 146 

MIMTEIWNFINTGSKNPYYNMAMDEALLNFVSRGEIDPVIRFYTWNPATLSIGYFQRLQK 
40 EIDIDKVKEKGYGLVRRQTGGRGVLHDKELTYSVIVPESHPNMPSTVTEAYKIISQGLLE 
GFKNLGFETYFAIPRSKEERDKLKQPRSSVCFDAPSWYELVVEGRKIAGSAQTRQKGVIL 
QHGSI LQDI DI DDLFDMFKFKNERLKAKMKENFVQKAVAINDI SNQH ITLNEMENAFEAG 
X 

45 Sequence 14 7 

Contig_04 51_pos_2108_3121, 

putative peptide of unknown function 

atggaacgattttgttgtgtaaatcaaattaactatattcaaatgaatccgttagaagcc 
aaatttaaaacgagcgctctaagatcatggaaaactgatcaggcagatgctcataagctt 

50 gcttgtttaggaccgacgctcaaacaaacaggcagcttacctatacatgagttaatattc 
tttgaattaagagaacgtgcccgttttcatctagaaatcgagaatgaacaaaatcgactt 
aaatttcagat tcttgaattactccatcaaacattccctggtttagaaagattatttagt 
agtcgatattcaatcattgcactcaacatcgcagaaatttttactcatccagacgtggtt 
cttgatatcgacaaggatgtacttattacacatatattcaattctacagataagggaatg 

55 tcaatggataaagctacaaaatatgcacttcaattaagagtgattgctcaagaaagctat 
cctaatgtcgatagacattcctttctagtcgaaaaattacgcttacttattcaacaatta 
aaacaatctattcatcatctcaaacaattagatgatgccatgattcaattagcacaacaa 
ctcgattattttgaaaatattcattcgatacctggtattggtaagctaagcacagctatg 
attattggggagattggtgatattaagcgatttaaatcaaataaacaactcaacgctttt 
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gtaggcattgatatcaaacgatatcaatcaggtcatacacactgtagagataccatcaac 
aagcgtggtaataaaaaagcgagaaaacttttattttgggtgattatgaatataataaga 
gggcagcatcattatgacaatcatgtcgtcgattattactacaaactaagaaagcagcct 
aatgagaaacctcataagactgccatcattgcttgtataaatcgattattaaaaacgatt 
5 cattatctggtaatgaatcataaattgtacgattatcaaatgtcaccacattag 



Sequence 14 8 

10 MERFCCVNQINYIQMNPLEAKFKTSALRSWKTDQADAHKLACLGPTLKQTGSLPIHELIF 
FELRERARFHLEIENEQNRLKFQILELLHQTFPGLERLFSSRYSI IALNIAEIFTHPDVV 
LDIDKDVLITHIFNSTDKGMSMDECATKYALQLRVIAQESYPNVDRHSFLVEKLRLLIQQL 
KQSIHHLKQLDDAMIQLAQQLDYFENIHSIPGIGKLSTAMI IGEIGDIKRFKSNKQLNAF 
VGIDIKRYQSGHTHCRDTINKRGNKKARKLLFWVIMNI IRGQHHYDNH VVDYYYKLRKQP 

15 NEKPHKTAI IACINRLLKTIHYLVMNHKLYDYQMSPH* 

Sequence 14 9 

C on t i g_0 4 51 _po s_4 2 5 4 _4 7 7 8 , 

putative peptide of unknown function 

20 atgcttgatacatctgacatcgaagggaaaacgattttagatgtgggatgtaatcaaggc 
ggatttttacgacagttatacgatacaacaccgtttaaaaaaggtgttggcatagattta 
gcacgtttatctttggaaaaggcagagacattaaaaggacaacgtccacttacatactat 
ttaacagataaaccgcaagaaacgaagcacgtgtttgatacggcagtaagtacgtctgtc 
ttgtacttaatagaagatattccgcaacatgcaaaagatttaaaagaggtattgaaacca 

25 ggcggtgtttattacgcttcattcgcggatttaactaataacccaagtcgtcagtttatg 
gatgacacgattaatcaatatggtgcaacaccttctcagaatcactctctaaaacatatc 
gttgatagctttgtggatgcaggatttgaagttgcagtaatgaaaaagcatgtacttata 
aatatttatctaaaaagattggatcaatcgttgaaacaactttaa 

30 Sequence 150 

MLDTSDIEGKTILDVGCNQGGFLRQLYDTTPFKKGVGIDLARLSLEKAETLKGQRPLTYY 
LTDKPQETKHVFDTAVSTSVLYLIEDI PQHAKDLKEVLKPGGVYYASFADLTNNPSRQFM 
DDTINQYGATPSQNHSLKHIVDSFVDAG FEVAVMKKHVLINI YLKRLDQSLKQL* 

35 Sequence 151 

Contig_04 51_pos_5060_5626, 

is similar to (with p-value 1.0e-19) 

>sp: sp | P234 77 | ADDB_BACSU ATP-DEPENDENT NUCLEASE SUBUNIT B. > 
pir : pi r | A3 94 32 | A3 94 32 ATP-dependent exonuclease synthesis pr 

40 otein AddB - Bacillus subtilis >gp:gp|M6348 9 j BACADDAA_1 Baci 
llus subtilis ATP-dependent nuclease (addA) and (addB) , and 
open reading frame 3, partial cds . NID: gl42438. >gp:gp|Z991 
09|BSUB0006_138 Bacillus subtilis complete genome (section 6 
of 21): from 999501 to 1209940. NID: g2633260. >gp:gp|Y1408 

45 1 ( BSY14081_20 Bacillus subtilis chromosomal DNA, region 92 d 
egrees: region between comK and addAB. NID: g2226171. 
atggatattgtattacaaaacaaggagcgtttaggtcttacagatattgtgaaaccaggg 
ggtctactttatttccatgtccatgaaccgcgtattaaatttaaaagttgggcagatata 
gatgaagaccaatttcaaaaagactatatcaaaaactttaaaatgagtggtttgcttaat 

50 cgtgaccaagaagtgttagacgctttagatattagacttgaaccaaagtataattcggat 
attgttccaatagcattaacagctaaaggcgctataaatcaacgtagtagtaaagtagct 
gatgaaaacatcatttatcaattaatagaacataataagaagaattttatcgagacagcc 
agccacattatggatggacatacggaagtggcacccttgaagtacaaacaagtattacct 
tgtcaattttgtaattataaatcagtttgtcatgtagacggattaatagatagtaagcgt 

55 tatagaacagtagatgaatcgataaaaccattagatttaattcaacaattaagaaatgaa 
ggtggtgaaagacatgattccaactaa 

Sequence 152 

MDIVLQNKERLGLTDIVKPGGLLYFHVHEPRIKFKSWADIDEDQFQKDYIKNFKMSGLLN 
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RDQEVLDALDIRLEPKYNSDIVPIALTAKGAINQRSSKVADENIIYQLIEHNKKNFIETA 
SHIMDGHTEVAPLKYKQVLPCQFCNYKSVCHVDGLIDSKRYRTVDESIKPLDLIQQLRNE 
GGERHDSN* 

5 

Sequence 153 

Contig_04 51_pos_62 4 9_92 69, 

is similar to (with p-value 0.0e+00) 

10 >sp:sp| P234 78 I ADDA_BACSU ATP- DEPENDENT NUCLEASE SUBUNIT A. > 
pir : pir I B3 9 4 32 I B394 32 ATP-dependent exonuclease synthesis pr 
otein AddA - Bacillus subtilis >gp:gp|M6348 9 |BACADDAA_2 Baci 
llus subtilis ATP-dependent nuclease (addA) and (addB) , and 
open reading frame 3, partial cds . NID: gl42438. >gp:gp|Z991 

15 09 1 BSUB0006_139 Bacillus subtilis complete genome (section 6 
of 21): from 999501 to 1209940. NID: g2633260. >gp:gpIY1408 
1 1 BSY14 081_21 Bacillus subtilis chromosomal DNA, region 92 d 
egrees : region between comK and addAB. NID: g2226171. 
atgcagcttatcaatgatttagcaatgatttttatgaaagcaggatatgaggaattacaa 

20 aaaagttatgacttattctcaatgatggaaagtgttgataagcagcttgaagttattgaa 
accgaacgcatgtttattactaaagctattgaaggtaaagtattaaatacagatgttatc 
acgcaacatgaatttatgagtcgttttccggcaataaatagcaagataaaagaagcaaat 
gaaggcatggaagatgctttaaatgaagcaaaacaacattatgataaatataaatcttta 
gttatgaaagtaaagaatgattatttttctagaaatgcagaagatttgcaaagagatatg 

25 caacaactcgcacctcgagtggcttatttagctcaaatagttcaagatgtgattcaatca 
tttggtgttcaaaaacgaagtcgtaatattttggatttttcagattatgaacattttgca 
ttacgcattcttactaacgaagatggctcaccttcgcgtatcgctgaaacgtatcgtgaa 
cattttaaagaaatcctagttgatgagtatcaagatactaatagagtgcaagaaaaaata 
ttatcttgtattaaaactggtgaagaacacgatggtaacttgttcatggttggggatgtg 

30 aagcagtctatttataaatttagacaagctgatcctagtttatttattgaaaaatataat 
cgcttttctagtagtggaaatgaaagtggcttgcgcattgacttatcgcaaaactttcgt 
tcgagacaggaagtgttatctacaaccaat tact tgttcaaacatat gat ggatgaacaa 
gtaggagaaatttcatatgatgatgcagcgcaattgtattttggtgcaccatatgacgaa 
gtttcacatcctgttcaattacgagcacttgttgaggcaagttcagaaaatagtgactta 

35 actggaagtgaacaagaagcgaattacattgttgaacaagttaaagatattattaatcat 
caaaacgtatacgatatgaaaacaggtcaatacagaaaagcaacatataaagatatcgta 
attttagagcgaagttttggtcaagcgcgtaatcttcaacaagcttttaaaaataatgat 
atcccttttcacgtaaatagtaaggaagggtattttgagcaaactgaagtacgtcttgtg 
ctttcatttttaagaacaatagataatccacttcaagacatttatttagtgggattgatg 

40 cgttctgtaatatatcaatttactgaagaagaattagctgaaataagagttgtaagccct 
catgatgattacttttatcaatctataaaaaattatatgattgatgaaaaagctgattct 
agattggttgacaagttaaatcgttttattcaggatatacaaaaatatcaaaattatagt 
ctaagtcaaccggtttaccaattaattgataaattttataatgatcattttgtaattcag 
tactttagcggtcttattggaggtaaaggtagaagagcaaatctgtatgggctatttaat 

45 aaagctgttgaatttgaaaattcaagtttcagaggtttattccaatttattcgttttatt 
gatgagcttattgatcgtaaaaaagattttggtgaagaaaatgtcgtaggtcctaacgat 
aatgtggttagaatgatgacgattcacagtagtaaaggattagaatttccatttgtaatt 
tactcaggattatctaaaaaattcaacaaaggtgacctgaatgcaccagttattctaaat 
caacaatatggtttaggtatggattattttgatgtaaataaagatatggcttttccttca 

50 cttgcctctgtggcatatagagcaataaatgaaaaagaacttatatcagaagagatgcgt 
ttaatctatgttgcgttgacacgagcaaaagagcaacttattttagttggaagagtcaaa 
gatgaaaagtcgttaattaaatatgaacaattagctgtttcagacacacatatagcagtt 
aatgaacgccttactgctaccaatccatttgttctaatttatggtgttttggctaagcat 
caatcgccttcattgccaaatgatcaaagatttgaaagagatattgatcaattaaattct 

55 gaagtgaagccacgtgtatcaatagtgattgatcattatgaggatgtttcaactgaagaa 
gtagtcaatgataatgaaataagaacaatcgaagaattaaaggccataaatactggtaat 
gaagatgtgaaaattaaaattcatcaacagctttcttatgactatccttttaaagttaac 
acgatgaaaccatctaaacagtcggtatcagagttaaaacgtcaattagaaactgaagaa 
agtaatacaaattatgatagagtacgtcaatatcgtattggtgttgcatcatatgaaaga 
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cccaagtttcttacccaaacaaaaaaaagaaaagcaaatgaaatagggactttaatgcat 
acagtcatgcaacacttaccttttagagaacaacgtttaacaaaagacgaattatttcaa 
tatatcgatcgattgattgacaaacaacttattgatgaagatgcaaaagaggatattaga 
atagatgagattatgcatttcattgatggccctctctatatggaaatagctcaagctgac 
5 aatgtttatactgaattaccttttgtggtaaatcaaattaaagttgatggacttacaagt 
gaagatgaagatgtatccattattcaaggtatgattgatttaatatatgaaagtgacgga 
caattttactttgttgattacaaaacagatgcttttaatagaagaaaaggtatgagtgat 
gaagaaatagggaatcagctcaaagaaaaatatcagatacaaatgacgtattatcgaaat 
actttagaaaccatacttaaacgacccgtaaagggttacttatattttttcaaatttggt 
10 acattagaaatagatgattaa 

Sequence 154 

MQLINDLAMIFMKAGYEELQKSYDLFSMMESVDKQLEVIETERMFITKAIEGKVLNTDVI 
TQHEFMSRFPAINSKIKEANEGMEDALNEAKQHYDKYKSLVMKVKNDYFSRNAEDLQRDM 

15 QQLAPRVAYLAQIVQDVIQSFGVQKRSRNILDFSDYEHFALRILTNEDGSPSRIAETYRE 
HFKEILVDEYQDTNRVQEKILSCIKTGEEHDGNLFMVGDVKQSIYKFRQADPSLFIEKYN 
RFSSSGNESGLRIDLSQNFRSRQEVLSTTNYLFKHMMDEQVGEISYDDAAQLYFGAPYDE 
VSHPVQLRALVEASSENSDLTGSEQEANYIVEQVKDIINHQNVYDMKTGQYRKATYKDIV 
ILERSFGQARNLQQAFKNNDI PFHVNSKEGYFEQTEVRLVLSFLRTIDNPLQDI YLVGLM 

20 RSVI YQFTEEELAEIRWSPHDDYFYQSIKNYMIDEKADSRLVDKLNRFIQDIQKYQNYS 
LSQPVYQLIDKFYNDHFVIQYFSGLIGGKGRRANLYGLFNKAVEFENSSFRGLFQFIRFI 
DELIDRKKDFGEENVVGPNDNVVRMMTIHSSKGLEFPFVI YSGLSKKFNKGDLNAPVILN 
QQYGLGMDYFDVNKDMAFPSL.ASVAYRAINEKELISEEMRLI YVALTRAKEQLILVGRVK 
DEKSLI KYEQLAVS DTH I AVNERLTATNPFVLI YGVLAKHQS PSLPNDQRFERDI DQLNS 

25 EVKPRVSIVIDHYEDVSTEEVVNDNEIRTIEELKAINTGNEDVKIKIHQQLSYDYPFKVN 
TMKPSKQSVSELKRQLETEESNTNYDRVRQYRIGVASYERPKFLTQTKKRKANEIGTLMH 
TVMQHLPFREQRLTKDELFQYIDRLIDKQLIDEDAKEDIRIDEIMHFIDGPLYMEIAQAD 
NVYTELPFVVNQIKVDGLTSEDEDVSIIQGMIDLI YESDGQFYFVDYKTDAFNRRKGMSD 
EEIGNQLKEKYQIQMTYYRNTLETILKRPVKGYLYFFKFGTLEIDD* 

30 

Sequence 155 

Contig_04 51_pos__1164 0_12653, 
putative peptide of unknown function 

atggaacgattttgttgtgtaaatcaaattaactatattcaaatgaatccgttagaagcc 

35 aa a ttt a aaacgagcgctctaagat cat ggaaaactgatcaggcagat get cataagctt 
gcttgtttaggaccgacgcttaaacaaacagacaacttacctatacatgagt taatattc 
tttgaattaagagaacgcgtccgttttcatctagaaatcgagaatgaacaaaatcgactt 
aaatttcagatccttgaattactccatcaaacattccctggtttagaaagattgtttagt 
agtcgatattcaatcattgcactcaacatcgcagaaatctttactcatccagacatggtt 

40 cttgatatcgacaaggaggtactgattacacatatattcaattctacagataagggaatg 
tcaatggataaagctacaaaatatgcacttcaattaagggtgattgctcaagaaagctat 
cctaatgtcgatagacattcctttctagtcgaaaaattacgcttacttattcaacaatta 
aaacaatctattcatcatctcaaacaattagatgatgccatgattcaattagcacaacaa 
ctcgat tattttgaaaatattcattcgatacctggtattggtaagctaagcacagctatg 

45 attattggggagattggtgatattaagcgatttaaatcaaataaacaactcaatgctttt 
gttggcattgatatcaaacgatatcaatcaggtcatacacactgtagagataccatcaac 
aagcgtggtaataaaaaagcgagaaaacttttattttgggtgattatgaatataataaga 
gggcagcatcattatgacaatcatgtcgtcgattattactacaaactaagaaagcagcct 
aa tgagaaacctcataagactgccatcattgcttgtataaatcgattattaaaaacaatt 

50 cattatcttgtaatgaatcataaattgtacgattatcaaatgtcaccacattag 

Sequence 156 

MERFCCVNQINYIQMNPLEAKFKTSALRSWKTDQADAHKLACLGPTLKQTDNLPIHELIF 
FELRERVRFHLEIENEQNRLKFQILELLHQTFPGLERLFSSRYSIIALNIAEIFTHPDMV 
55 LDIDKEVLITHIFNSTDKGMSMDKATKYALQLRVIAQESYPNVDRHSFLVEKLRLLIQQL 
KQSIHHLKQLDDAMIQLAQQLDYFENIHSIPGIGKLSTAMIIGEIGDIKRFKSNKQLNAF 
VGIDIKRYQSGHTHCRDTINKRGNFCKARKLLFWVIMNIIRGQHHYDNHVVDYYYKLRKQP 
NEKPHKTAIIACINRLLKTIHYLVMNHKLYDYQMSPH* 
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Sequence 157 

Contig_04 51_pos_15717_16034, 
5 is similar to (with p-value 1.0e-38) 

>gp:gp| Z79580 I BS168NPRBJ7 B.subtilis nprB gene. NID: gl62092 
1. >gp:gp|Z99109|BSUB0006_192 Bacillus subtilis complete gen 
ome (section 6 of 21): from 999501 to 1209940. NID: g2633260 
. >gp : gp | Y09476 | BSY094 7 6_56 B.subtilis 54kb genomic DNA frag 

10 ment. NID: g2145361. 

gtgatactgatggaagaagcactaaaagatagtatcttaggcgctcttgaaatggtaata 
gatcctgagttagggatagatatcgttaatttaggtttagtatataaagttgatgttgat 
gatgaaggtttatgtacagttgaaatgacattgacttcgatgggatgtccattaggacca 
caaattattgaacaagttaagagtgttttggctgagattcctgaaatttctgatacagaa 

15 gtgatgattgtatggagtccaccttggaataaagatatgatgtcacgatatgccaaaata 
gctttaggcatcggataa 

Sequence 158 

VILMEEALKDSILGALEMVIDPELGIDIVNLGLVYKVDVDDEGLCTVEMTLTSMGCPLGP 
20 QIIEQVKSVLAEIPEISDTEVMIVWSPPWNKDMMSRYAKIALGIG+ 



Sequence 159 

Contig_0451j?os_164 80_17565, 

is similar to (with p-value 8.0e-75} 

25 >gp:gp|U93874 |BSU93874_12 Bacillus subtilis cysteine synthas 
e (yrhA) , cystathionine gamma-lyase (yrhB) , YrhC (yrhC) , Yrh 
D (yrhD) , formate dehydrogenase chain A (yrhE) , YrhF (yrhF) , 
formate dehydrogenase (yrhG) , YrhH (yrhH) , regulatory prote 
in (yrhl), cytochrome P450 102 (yrhJ) , YrhK (yrhK) , hypothet 

30 ical protein YrhL (yrhL) , putative anti-SigV factor (yrhM), 
RNA polymerase sigma factor SigV (sigV) and YrhO (yrhO) gene 
s, complete cds, and YrhP (yrhP) gene, partial cds . NID: gl9 
34604. >gp:gp|Z991171BSUB0014_194 Bacillus subtilis complete 
genome (section 14 of 21) : from 2599451 to 2812870. NID: g2 

35 634966. 

atgcctggattagatggtttgcgagcaattgcagtcattggtattattatttatcacttg 
aataaacaatggttaacaggtggttttttaggcgtagatactttttttgttatttcaggt 
ta ttt gat tacgagcttattacttaaagagtatgaaga tact ggaacaataaatcttaaa 
aatttttggattcgtcgtattaaaaggttattaccagcggtatttgcattaatagtagta 

40 gttggaattgcaactttattattgcaccccgagcatattgtaagagttaaacatgatatg 
atagcagcaatattttacgtatctaattggtggtatattgctaaagatgtcaattatttc 
gagcaattttcttttatgcctttaaagcacctatggtcactagccattgaagagcagttt 
tacctttttttcccagcagtactcttattatttatggcaatagttaagaaaaagaaaaat 
gtcatactgatgttttggatcatatccctggtttcattattaatgatggttgttatttct 

45 caacctcacttgaaccattctagagtatattttggaactgatacaagattgcagacactg 
cttttaggtgtacttctagcatttatctggccaccttttaaattaaatcccaatccacct 
aaaggattaaaaactgtgattaatagtgcgggtatcataggacttacatttttaattcta 
ttattctttactgttagtgatgaaagtgattggatttataacggtggattttatcttatt 
tcaacaatgactttgctaattattgcaagtgttgttcatccaacgacaattttagctaag 

50 ttattaggaaatcctttatttgtctacattggaaagcgttcatacagtttatacttatgg 
cattttcctgtaattagctttattcatagttattttattgatggtcaattaccaacttat 
gtttatattatggatatcgtaattactgtattattagccgaattatcatttagatatgtt 
gaaacgccattaagaaaggaaggtctaaaggcttttacagtgtgctcccttaaaaattat 
ttttag 



Sequence 160 

MPGLDGLRAIAVIGI IIYHLNKQWLTGGFLGVDTFFVISGYLITSLLLKEYEDTGTINLK 
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NFWIRRIKRLLPAVFALIVWGIATLLLHPEHI VRVKHDMIAAIFYVSNWWYIAKDVNYF 
EQFSFMPLKHLWSLAIEEQFYLFFPAVLLLFMAIVKKKKNVILMFWIISLVSLLMMVVIS 
QPHLNHSRVYFGTDTRLQTLLLGVLLAFIVJPPFKLNPNPPKGLKTVINSAGI IGLTFLIL 
LFFTVSDESDWI YNGGFYLI STMTLLI IAS VVH PTTILAKLLGNPLFVYIGKRSYSLYLW 
5 HFPVISFIHSYFIDGQLPTYVYIMDIVITVLLAELSFRYVETPLRKEGLECAFTVCSLKNY 
F* 

Sequence 161 

Contig_04 51_pos_17821_0, 

10 is similar to (with p-value 6.0e-46) 

>sp:sp|P4 9022|PIP_LACLA PHAGE INFECTION PROTEIN. >gp:gp|Ll46 
79|LACPIP_1 Lactococcus lactis pip and gerC2 genes, complete 

cds's, and rrg gene, 5* end of cds . NID: g308860. 
atgaaaaacgcactaaaactttttatcacggatttaaaaagagttgctaaaacaccaggt 

15 gtatgggtcatcttagctggtttagcaattcttccttcattctatgcatggtttaacctc 
tgggctatgtgggatccgtatggtcatacaggacatatcaaagttgccgtagtgaatgaa 
gaccaaggtgaaaaagttcgtggtaagaatattaatgtaggaaataaaatggtcaaaact 
ttaaaaaagaatgatagttttgactggcaatttgtgagtagagaaaaagccgaccatgaa 
attaagatgggaaaatattatgcaggtatttatataccgaagaaattcacacatgaaatc 

20 actggtactttaagaaaacatcctcaaaaggcggatatagattttaaagtaaatcagaag 
attaatgctgtagcagctaagttaaccgatacgggatcgtcgtttgtgattgataaagca 
aataaacaatttaacaaaaccgtagcaaccgctttactttctgaagctaataaagtcgga 
ctatcaattgaagataatgtacctacaatcaataaaattaagagtgctgtata tcaagct 
aataattcattgcctaaaattaatcaatttgcagacaagattattgaactaaataaacat 

25 caagacgatttggatgcttatgctaatcaatttagaagtttaggaaagtat 

Sequence 162 

MKNALKLFITDLKRVAKTPGVWVILAGLAILPSFYAWFNLWAMWDPYGHTGHIKVAVVNE 
DQGEKVRGKNINVGNKMVKTLKKNDSFDWQFVSREKADHEIKMGKYYAGIYI PKKFTHEI 
30 TGTLRKHPQKADIDFKVNQKINAVAAKLTDTGSSFVIDKANKQFNKTVATALLSEANKVG 
LSIEDNVPTINKIKSAVYQANNSLPKINQFADKIIELNKHQDDLDAYANQFRSLGKY 

Sequence 163 

Contig_04 51_pos_16054_15662, 

35 is similar to (with p-value 2.0e-38) 

>gp:gp|Z79580|BS168NPRB_7 B. subtilis nprB gene. NID: gl62092 
1. >gp:gpl Z99109 I BSUB0006_192 Bacillus subtilis complete gen 
ome (section 6 of 21): from 999501 to 1209940. NID: g2633260 
. >gp:gp| Y094 7 6 | BSY094 7 6_56 B. subtilis 54kb genomic DNA frag 

40 ment. NID: g2145361. 

atgagtagctatcctttatattatccgatgcctaaagctattttggcatatcgtgacatc 
atatctttattccaaggtggactccatacaatcatcacttctgtatcagaaatttcagga 
atctcagccaaaacactcttaacttgttcaataatttgtggtcctaatggacatcccatc 
gaagt caa tg t ca tttcaactgtacataaacct teat cat caacatcaactttata tact 

45 aaacctaaattaacgatatctatccctaactcaggatctattaccatttcaagagcgcct 
aagatactatcttttagtgcttcttccatcagtatcacctctttaaaattttctttacac 
caatatatcaaatatccgacaaaacgccaataa 

Sequence 164 

50 MSSYPLYYPMPKAILAYRDIISLFQGGLHTIITSVSEISGISAKTLLTCSIICGPNGHPI 
EVNVISTVHKPSSSTSTLYTKPKLTISIPNSGSITISRAPKILSFSASSISITSLKFSLH 
QYIKYPTKRQ* 



Sequence 165 

ContigJ3451_pos_15613_14792, 

is similar to (with p-value 1.0e-74) 

>gp:gp|Z79580|BS168NPRB_5 B. subtilis nprB gene. NID: g!62092 
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1. >gp:gp| Z99109|BSUB0006_190 Bacillus subtilis complete gen 
ome (section 6 of 21): from 999501 to 1209940. NID; g2633260 
. >gp:gp| Y09476 | BSY09476_54 B. subtilis 54kb genomic DNA frag 
ment. NID: g2145361. 
5 atgcaaccttatttaatttgtctagatctagatggtacattattaaatgacaataaagaa 
atctcaccttacactaaacaagtattaaccgaattacaacaatgtggacactacgttatg 
attgctactggaagaccctatcgcgcaagccagatgtattatcatgaactaaatatgagc 
acacctgttgttaactttaatggagcatttgtacatcatccaaaagcaaacgattttaaa 
gtgatacatgaagtacttgatgtggaaatttctaaaaatattattacagcacttcaacaa 

10 tctcatattacaaatatcat tgctgaagtaaaagactatgtctt tataaatagttatgat 
tcaagactttacgaaggtttttcaatgggaaatcctaaaattcaaacaggtaatttactt 
gaaaatcttaatgaagcacctacgtcattacttgttgaagcagaagaagaaaatattcct 
gaaattaaagatatgttaacacatttttatgcagaaaatattgaacatcgtcgttggggc 
gcaccgtttccagtaatagaaattgtgaagcgtgggattaacaaagcacgtggaatcaag 

15 catgttcaaaactatttaaacatcgccgacgatcatatcattgcgtttggtgatgaggac 
aatgatatagaaatgataaagtttgcgacccatggcattgcaatggccaatggcttgaaa 
gatttaaaggaaatagcaaatgagactacgtatagtaataatgaagacggaataggtcgt 
tatttaaatgacttttttaatttgaaaatacgttattattaa 

20 Sequence 166 

MQPYLICLDLDGTLLNDNKEISPYTKQVLTELQQCGHYVMIATGRPYRASQMYYHELNMS 
TPVVNFNGAFVHHPKANDFKVIHEVLDVEISKNI ITALQQSHITNIIAEVKDYVFINSYD 
SRLYEGFSMGNPKIQTGNLLENLNEAPTSLLVEAEEENIPEIKDMLTHFYAENIEHRRWG 
APFPVIEIVKRGINKARGIKHVQNYLNIADDHIIAFGDEDNDIEMIKFATHGIAMANGLK 

25 DLKEIANETTYSNNEDGIGRYLNDFFNLKIRYY* 

Sequence 167 

Contig_04 51_pos_14 714_13398, 

is similar to (with p-value 0.0e+00) 

30 >gp:gp| AF0414 67 | AF0414 67__1 Staphylococcus aureus coenzyme A 
disulfide reductase gene, complete cds . NID: g2792489. 
atgaataaaattataatagtcggtgcagttgctggtggtgcgacttgtgcaagtcaaatt 
cgaagattagataaagagagtgaaatcattgtttttgaaaaagatagagacatgagcttt 
gctaattgtgcattaccttattatattggcaacgttatcgaggaccgtcgtaaagtt tta 

35 gcatacacgcccaatcaattttatgacaaaaagcaaatcactgtaaaaacataccatgaa 
gttatacaaatcaatgatgagagacaaacagttactgtcttaaatcatcaaac taatcaa 
acttttgaagaaagttacgatacattgattttaagtcctggcgcatctgcaaatcgatta 
aacactcatagtgatatctcatttactgtgcgaaatctcgaagatactgaaacaattgat 
acctttattacgaataccaaagcacaacgtgcacttgttgttggcgcgggttacatctct 

40 t t agaag t cc t tgaa aatttaca teat a gaggtttggatgtcacatggattcatcgctct 
acaaatattaataaactgatggatcaagatatgaatcaacccatcatcgacgaaatagaa 
aagagaaatatcacttatagatttaacgaagaaattagtcacgtaaatggacatgaagtt 
acattcacatctggtaaagttgaaaactttgatcttattatcgaaggtgtaggtactcat 
ccaaattcacaatttat taaatcatctaacgtcatactgaatgataaaggttatatccca 

45 gtaaatcataatttccaaacaaatataccaaatatttatgcattaggtgatgttattact 
tcacattatcgtcatgtgaatttaccggcacaggttccacttgcttggggagcacaccgt 
ggtgcaagtattatagctgaacaactttctggaaattcgtctattcactttaaaggttat 
ctaggaaataatatagtgaaattttttgactatacattagcaagtgttggcatcaaacca 
aatgaacttaaaaatttcgattatgatatggttgaagttaagcaaggagctcatgcagga 

50 tattacccaggaaattcaccactacatttacgtgtttattttgaaaaagactcgagaaaa 
cttatacgcgcagcagcagttggtaaacaaggtgccgataaaagaatagacgt at t at ca 
atggcaatgatgaataatgctactgtggatgatttaacagaatttgaagtagcatatgca 
cctccttatagtcatccaaaagatttaattaatttaattgggtataaagcgcaataa 

55 Sequence 168 

MNKII IVGAVAGGATCASQIRRLDKESEI IVFEKDRDMSFANCALPYYIGNVIEDRRKVL 
AYTPNQFYDKKQITVKTYHEVIQINDERQTVTVLNHQTNQTFEESYDTLILSPGASANRL 
NTHSDISFTVRNLEDTETIDTFITNTKAQRALVVGAGYISLEVLENLHH RGLDVTWIHRS 
TNINKLMDQDMNQPI IDEIEKRNITYRFNEEISHVNGHEVTFTSGKVENFDLI IEGVGTH 
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PNSQFIKSSNVILNDKGYIPVNHNFQTNIPNIYALGDVITSHYRHVNLPAQVPLAWGAHR 
GASIIAEQLSGNSSIHFKGYLGNNIVKFFDYTLASVGIKPNELKNFDYDMVEVKQGAHAG 
YYPGNSPLHLRVYFEKDSRKLIRAAAVGKQGADKRIDVLSMAMMNNATVDDLTEFEVAYA 
PPYSHPKDLINLIGYKAQ* 

5 

Sequence 169 

Con t i g_0 4 5 2_po s_5 4 7 _1 3 2 3 , 

is similar to (with p-value 8.0e-40) 

>sp:sp|Q57629| Y165_METJA HYPOTHETICAL PROTEIN MJ0165. >gp:gp 
10 . |U67473|U67473_9 Methanococcus jannaschii section 15 of 150 
of the complete genome. NID: g2826256. 

atgagccatagttataattctatagaagaggtgctcaaagctgtaaaatcaaatcaacta 
tctattaatgatgctaaagcccaactcagtcattatgacgaattgggctttgctaaaatt 
gacttacatagagcacagcgtcaaggatttcccgaagttatctttgggcaaggaaaaaca 

15 aaagaacaaatcactaaaatcatctctagtttgatatttcataatgaagttattctagtg 
acacgtgttgatgaaatgaaagcaaaatacattttacaacattatccaaacttggaatat 
catcaaactgcacagttaattagcactccactaaaagatataccacaatctaaatactat 
gtttctgtactttgtgctggaacttctgatttacctattgcagaagaagctgcattaacc 
gctgaaatcatgggagtaagtgtaaaacgattttatgatgtcggggtttcagg tat teat 

20 cgcttattatccaacattcatgatatacgcagagggaaagtttctatcgttatagctgga 
atggaaggcgctttagcaagtgttgttggaggattagtcaaccaccctgtatatgcagta 
ccaacgagtgtaggttatggagcaaacttgaatggggttaccaccctattatcaatgata 
aatagttgcgcacccggaaccagcgtattaaatatcaataatggatttggtggcggttac 
aacgctgcacagattattcatatgctagaaaataaagagagtgaggtatctttatga 

25 

Sequence 170 

MSHSYNSIEEVLKAVKSNQLSINDAKAQLSHYDELGFAKIDLHRAQRQGFPEVIFGQGKT 
KEQITKIISSLIFHNEVILVTRVDEMKAKYILQHYPNLEYHQTAQLISTPLKDIPQSKYY 
VSVLCA'GTSDLPIAEEAALTAEIMGVSVKRFYDVGVSGIHRLLSNIHDIRRGKVSIVIAG 
30 MEGALASWGGLVNHPVYAVPTSVGYGANLNGVTTLLSMINSCAPGTSVLNINNGFGGGY 
NAAQIIHMLENKESEVSL* 

Sequence 171 

Contig_04 52_pos_1368_2507 / 

35 putative peptide of unknown function 

atgctactttctgctt tagttgat ttaggagcaaaccctgaagacattgaatcagaacta 
aaaaaattacctttagatcaatttaagctacattttcaaaaaagag taaaacaaggtat t 
catgcaatgacattaaacattgatgttaaagaagcaaatcatcatcgtcacgttaatgat 
atatttaaaatgatagatgacagtacacttccggaaagggttaaatatcgcagtaagaaa 

40 atttttgaaatcattggtcaagcagaagctaaaattcatggcatgtcgtttgaagaagtt 
cactttcatgaagtgggggcaatggactctattatagatattattggtgggtgtattgca 
ctagaacaactagggattaacacattatactgttcagctattccaacaggtcatggtaaa 
atcaatattgctcatggcatttatccaatccctgcaccagctactgctgaaattcttaaa 
ggtataccaatcgcacattttgatgttcaaagtgaactcacaacccctactggtgctgca 

45 tttgctaagggacttgtttcatcgtttgggccatttccttcagcaacaatacaacatata 
ggctatggcgccggcagtaaggattttgatttccctaatatattaagggttattcaattt 
gaatctgaattcgagcaacaagatagcgtccaagtaatagagtgtcaaatagatgatatg 
acacctgaagcattaggttattttatgaataatgcgttagagcaaggtgctttagatgct 
tactatacgcctatatttatgaaaaaaagtcgcccaagcacgcagttaacgttaatatgt 

50 aaattacatgataagacatatttcgaacaacttatcttacaagaaacaagttctttaggc 
gt caga agt a c tt ct gt t a a taga aagacct tgaaccgcgca t t caa a a ttctttct a ca 
caacacggcactgtttccattaaatttggcctacaaaatggaaaaattatgaaaatgaaa 
cccgagtatgaagatttgaagaaaatagctaaaactacaaaacaaccgtttcaagtaatt 
cataacgaggtattacaacaactctatcaaacatatcatataggaaatatacttcaataa 

55 

Sequence 172 

MLLSALVDLGANPEDIESELKKLPLDQFKLHFQKRVKQGIHAMTLNIDVKEANHHRHVND 
I FKMIDDSTLPERVKYRSKKI FEI IGQAEAKIHGMSFEEVHFHEVGAMDSII DI IGGCIA 
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LEQLGINTLYCSAI PTGHGKINIAHGI YPIPAPATAEILKGI PIAHFDVQSELTTPTGAA 
FAKGLVSSFGPFPSATIQHIGYGAGSKDFDFPNILRVIQFESEFEQQDSVQVIECQIDDM 
TPEALGYFMNNALEQGALDAYYTPIFMKKSRPSTQLTLICKLHDKTYFEQLILQETSSLG 
VRSTSVNRKTLNRAFKILSTQHGTVSIKFGLQNGKIMKMKPEYEDLKKIAKTTKQPFQVI 
5 HNE VLQQL YQT YHIGNILQ* 

Sequence 173 

Contig_04 52_pos_3958_3161, 

is similar to (with p-value 1.0e-85) 

10 >sp:sp|P39651| YWFO_BACSU HYPOTHETICAL 51.0 KD PROTEIN IN PTA 
3' REGION. >gp:gp|Z99123|BSUB0020_56 Bacillus subtilis compl 
ete genome (section 20 of 21): from 3798401 to 4010550. NID: 
g2636240. >gp : gp I Z80355 | BSUWFO_l B. subtilis ywfO, ywgA and 
ywgB genes. NID: gl561566. 

15 atgatttcctcacaaattgatgctgatcgaatggactatttacaaagagatgcatatttt 
acaggcgtaacgtatggctcatttgatatggagcgtattttaaggttgatgagaccatct 
aaagaagaagtgttaattaaagatagtggtatgcatgctgtcgaaaattttattatgagt 
cgttatcaaatgtattggcaaatatattttcatccagtaagccgtggtggggaagtttta 
ttaaacaattgtttaaaacgagctaagcagctttataatgaaggatatgaatttaaaatg 

20 tatccaaaagactttataccattctttgaaggaacaatgacgattgaacaatatgtagaa 
cttgatgaagcagttgtattgtattacttgaagaaatggattcatgaaaatgatacaata 
ttaagtgatttatcaagacggtttatcaatcgagatttatttaaatatattcctttcgac 
ggttcaattattaccatttcggaattgcaagaattatttgaagcgggtggtattaatcct 
gattattactttgtaagtgaagcattttcagatttaccttatgattatgatcgcccaggc 

25 tcaaatcgcaaaccgattcatttattaaaaagtaatggtggaattacagaaataagtaat 
caatcattggtgattaatagtattacagggattaatagagaagaccataaattatattat 
cctaaagagatgattttaaaaattaaagattatcaaattaaaggttctattattaactta 
cttaatgaattaaattaa 

30 Sequence 174 

MISSQIDADRMDYLQRDAYFTGVTYGSFDMERILRLMRPSKEEVLIKDSGMHAVENFIMS 
RYQMYWQIYFHPVSRGGEVLLNNCLKRAKQLYNEGYEFKMYPKDFIPFFEGTMTIEQYVE 
LDEAVVLYYLKKWIHENDTILSDLSRRFINRDLFKYIPFDGSI ITISELQELFEAGGINP 
DYYFVSEAFSDLPYDYDRPGSNRKPIHLLKSNGGITEISNQSLVINSITGINREDHKLYY 

35 PKEMILKIKDYQIKGSIINLLNELN* 

Sequence 175 

Con t ig_0 4 5 3_po s_l 3 8 5_2 1 9 1 , 
is similar to (with p-value 5.0e-38) 
40 >gp:gp|D86240!D86240_5 Staphylococcus aureus gene for unkown 
function and dlt operon dltA, dltB, dltC and dltD genes, com 
plete cds. NID: gl405333. 

atgtcacaaagtcaaatcaatcagatgtttaatcaaaaagatatgccagctaatttgaag 
aaacggtatgcacaaagattgttacagtttccgcatgcacacaataagtcataccttaga 

45 gaacaagcaaaacatcctaatgatgtctctggaaactacatttcttcatttaaagaaaat 
caattaactaagattgaagctattaaatcattattctcattcactaagccacctctagca 
gaagtaaaacctgcaacaagagaagatgcttcatgggatgagatgaaacataaagctgcc 
gatataggcaaagcaaatactcaatctaataaatatgatataagagatccatattggaaa 
ttgataaaacaaaacaagcgtaaaatcaaaagggattatgagttcaacattaactcaccc 

50 gagttccaagatttaaaattattagtgcaaacgctacatgctgctggagctgatgtacaa 
tatgtttgtataccttcaaatggaagatggtatgatcatataggtatcaaaaaagataga 
cgtgaagctgtatataaaaagattcactcaactgtagttgataatggtgggaaaatttat 
gatttgacaaataaggactatgaaaagtacgtaattagtgatgctgttcatattggatgg 
aaaggttgggtttacgtcgaccagcaaattgcaagacatatggatggtcatgcgcctaaa 

55 aatcatgaagtcgattattctaaaaataaaccaccgcacaaacatcacaacgatcgtcaa 
gatgatcaacatcaaggcaacaaataa 

Sequence 176 

MSQSQINQMFNQKDMPANLKKRYAQRLLQFPHAHNKSYLREQAKHPNDVSGNYISSFKEN 
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QLTKIEAIKSLFSFTKPPLAEVKPATREDASWDEMKHKAADIGKANTQSNKYDIRDPYWK 
LIKQNKRKIKRDYEFNINSPEFQDLKLLVQTLHAAGADVQYVCI PSNGRWYDHIGIKKDR 
REAVYKKIHSTWDNGGKIYDLTNKDYEKYVISDAVHIGWKGWVYVDQQIARHMDGHAPK 
NHEVDYSKNKPPHKHHNDRQDDQHQGNK* 

5 

Sequence 177 

Contig_04 53_pos_2831_3151, 

putative peptide of unknown function 

atgactaaaattagtgttgtcgtatatggagcagaagtcgtttgtgcgagttgtgtaaat 
10 gcacctacatctatagatacttatcaatggcttcaagcattacttttaagaaagtttcct 
caacatcattttgaatttacatatattgacatacgaaatgatactgaaaatttaactgat 
catgatatgcaatttatagaaagaattaatgaagatgaattgttttacccattagttacg 
atgaatgatgaatatgtagcagatggttacatacaatataaacaaataacccgttttatt 
aaa teat attt tact at gtaa 

15 

Sequence 178 

MTKI S V VV YGAEVVCASC VNAPTS I DT YQWLQALLLRKFPQHH FEFT Y I DI RN DTENLT D 
HDMQFIERINEDELFYPLVTMNDEYVADGYIQYKQITRFIKSYFTM* 

20 Sequence 179 

Contig_04 53_pos_0_37 6, 

is similar to (with p-value 1.0e-43) 

>gp:gp| Y09570 | SAFEMD_1 S. aureus femD gene. MID : g!684748. >g 
p : gp j Y 1 5 4 7 7 | SAARGFEMD_4 Staphylococcus aureus argl, glmM gen 

25 es and ORF1 and ORF2 . NID: g3892891. 

atgccatctattccagaaatctttaatatttttggctttaaacggttgtttaaagttagg 
cggtgtcttaatggattattaacgggtgttcagttggcttccgttattaaaatgagtggt 
aaaactctaagcgagttagcttctcaaatgaaaaagtacccacaatctttaattaatgtg 
agagtgactgacaaatatcgtgttgaagagaatattcatgttcaagagataatgacgaaa 

30 gttgaaacagagatgaatggtgaaggaagaattcttgttcgtccttctggaactgaacct 
ttagtacgtgtaatggttgaggctgcaactgacgcggatgctgaaagatatgctcaaagt 
atcgctgacCGCGACA 

Sequence 180 

35 MPSI PEI FNI FGFKRLFKVRRCLNGLLTGVQLASVIKMSGKTLSELASQMKKYPQSLINV 
RVTDKYRVEENIHVQEIMTKVETEMNGEGRILVRPSGTEPLVRVMVEAATDADAERYAQS 
IADRDX 

Sequence 181 
40 Contig_0454_pos_441_1559, 

is similar to (with p-value 0.0e+00} 

>gp:gp! Y14370 I SAY14370_3 Staphylococcus aureus RF3, murE, yp 
fP genes. NID: g3256221. 

atgcaagtcacgcaaagtattgtcaaccaattgaatgagatgaatctcaatcatttatca 
45 gtcattcaacatgatttgtttatggaagctcatccaattatgacttctatatgtaagaaa 
tggtatatcaatagctttaaatattttagaaatacatataaacgattttactatagtcgc 
cctaatgagctcgataaatgtttttataaatattatggattaaataaactcatcaactta 
cttattaaagaaaagcctgatctcatattattaacatttccaacacctgtgatgtcagtg 
ttgaccgaacaatttaatataaatatccctattgcgacagttatgacagattatcgcatg 
50 cataaaaattggattacaccatattcacaaagatattatgtagcaacaaaagatactaaa 
gatgatttcattgaagctggtgttcctgcttcatatattaaagtgacgggcattcctatt 
gctgataaatttgaagaatctattgataaagaagaatggttatcgcaacaacatttagac 
ccttcaaaacctactatattaatgtcagcaggtgcatttggtgtttcaaaaggctttgac 
tatatgattaataatattttagaaaaaagtccaaattcgcaagtggtcatgatttgtgga 
55 cgtagtaaggaacttaaacgttcattaaaagctaagttcaaagataatccaagtgtaata 
atattaggatatacaaatcacatgaatgagtggatggcatcaagccaactaatgattaca 
aaacctggtggtatcacaat ttccgaaggacttagtcgttgtattcctatgattttttta 
aaccctgcacccggtcaagaacttgaaaatgcatattactttgaaagtaaaggatttgga 
aaaatagcagatactccaaatgaggcaattgatattgtttctgacttaacaaataacgaa 
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gagactttaaaggttatgtcatctaaaatgctagaatcaaaggtaggatattctactaga 
aagatttgtaaagatttattagatttaataggtcactcatctcaaccggatgaaatctat 
ggaaaggttcctttgtatgcaagattcttcgtcaagtaa 

5 Sequence 182 

MQVTQSI VNQLNEMNLNHLSVIQHDLFMEAHPIMTSICKKWYINSFKYFRNTYKRFYYSR 
PNELDKCFYKYYGLNKLINLLIKEKPDLILLTFPTPVMSVLTEQFNINIPIATVMTDYRM 
HKNWITPYSQRYYVATKDTKDDFIEAGVPASYIKVTGIPIADKFEESIDKEEWLSQQHLD 
PSKPTILMSAGAFGVSKGFDYMINNILEKSPNSQVVMICGRSKELKRSLKAKFKDNPSVI 
10 ILGYTNHMNEWMASSQLMITKPGGITISEGLSRCIPMIFLNPAPGQELENAYYFESKGFG 
KIADTPNEAIDIVSDLTNNEETLKVMSSKMLESKVGYSTRKICKDLLDLIGHSSQPDEIY 
GKVPLYARFFVK* 

Sequence 183 
15 Contig_0454_pos_1585_2730, 

putative peptide of unknown function 

atgatgttggtaattctgtttctgatggaatttgcaagaggtatgtacatactaagttat 
ataaactttttacctacagtgacctctatcgcaatagcaatcacatcatttgctttttcc 
attcactttatcgcagatgctgcaacaaattttgtcatcggctttttacttaaaaaattt 

20 ggttcaaaattagtacttacatctggattcttacttgcttttataagcttgtttttagtg 
atatggttcccggcatcaccattcataattattttcagtgctattatgttaggaattgct 
gtgagtccgatttgggttatcatgttatctagtgtagatgaaagaaatcgcggcaaacaa 
atgggttatgtctacttttcatggttgctaggtttattggtgggtatggttatcatgaac 
ttgcttattaaattccatcctactcgttttgcatttttaatggccttggttgtgcttatt 

25 gcctgggtactatactattttgttaatatcaacttaacaaattacaatactaaacctgtg 
aaagcacaattaaagcaaattgtagatgttacacaacgtcatcttattctatttccgggt 
atcttgttacaaggagcagctatagcagcacttgtacctattcttccaaaatatgcaacg 
caagttgtgaaagtatcaaccgttgaatatacagtagcaatcattattggtggcataggc 
tgtgctttctctatgttatttttatcaaaaatcatcgacaataatagcaaagggtttatg 

30 tatggagttatttttagtggctttatactatatacaattcttatattcgggctatctaca 
attacaaatatatatatagtttgggccataggactttttattgggctaatgtacggtatc 
ctcttaccggcttggaatacctttatggctgggcatattaatcctaacgaacaggaagaa 
acatggggcgtgttcaacagtgttcaaggcttcggttcaatgataggcccactagtcgga 
ggtctaattactcaatttactaataatttaaataataccttttacttttcagcgatgatt 

35 tttcttgcacttgcagtattttacggatattactttattaaaacaaacagaagggttaaa 
ccttaa 

Sequence 184 

MMLVILFLMEFARGMYILSYINFLPTVTSIAIAITSFAFSIHFIADAATNFVIGFLLKKF 
40 GSKLVLTSGFLLAFISLFLVIWFPASPFI IIFSAIMLGIAVSPIWVIMLSSVDERNRGKQ 
MGYVYFSWLLGLLVGMVIMNLLIKFHPTRFAFLMALVVLIAWVLYYFVNINLTNYNTKPV 
KAQLKQI VDVTQRHLI LFPGI LLQGAAIAALVPI LPKYATQVVKVSTVE YTVAI I IGGIG 
CAFSMLFLSKI I DNNSKGFMYGVI FSGFI LYTI LI FGLSTITNI YIVWAIGLFIGLMYGI 
LLPAWNTF^4AGHINPNEQEETWGVFNSVQGFGSMIGPLVGGLITQFTNNLNNTFYFSAMI 
45 FLALA V FYGYYFIKTN RR V K P * 

Sequence 185 
Cont i g_04 5 4_pos_j4 1 4 2_0 , 
is similar to (with p-value 3.0e-66} 
50 >gp:gp|U57060|SAU57060_l Staphylococcus aureus scdA gene, co 
mplete cds . NID: gl575060. 

atgaaattcttccctacgtttttcaacatcttcgtaatggaattttgttgtggcggacaa 
gagagtatcgcttcagctgtcaatcataaaccaaatattgacttaaattccttattaaat 
aagttgaatcatattgataatacagaaggtaacagtaccat taatcctaaatt tttaaat 
55 gttgaatctcttatacaatatatacaatcagcttatcacgaaacgcttaaagaagaattt 
aagaatcttacaccttacatgactaaattggcaaaagtacatggtcctagtcacccatac 
ttattaaaattacaagacttatatcgcgagtttcgtgatagtatgttggatcatatacgt 
aaagaagatgaggaagattttcctaaactcattcaatatagtcaaggacaagatgtacaa 
aacattaaaatcatattagaagatttaattaatgaccacgaagatactgggcaattatta 
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aatgttatgaatcaactaacctctgattatcaaaccccagaagaagcatgtggaacatgg 
aagcttgtttaccaaagattacaaaatatcgaacgtcaaacacaccaacatgtacat 

Sequence 186 

5 MKFFPTFFNIFVMEFCCGGQESIASAVNHKPNIDLNSLLNKLNHIDNTEGNSTINPKFLN 
VESLIQYIQSAYHETLKEEFKNLTPYMTKLAKVHGPSHPYLLKLQDLYREFRDSMLDHIR 
KEDEEDFPKLIQYSQGQDVQNIKI ILEDLINDHEDTGQLLNVMNQLTSDYQTPEEACGTW 
KLVYQRLQNIERQTHQHVH 

10 Sequence 187 

Contig_04 54_pos_4111_3662, 

putative peptide of unknown function 

atgggaaaagaaattcttccttatatcgatgcgacatttccaacttataaagtaggtaat 
acaaggttacttattggagatagtttagcaggaagtatcgctttaatgactgcaatgact 

15 tacccaactatttttagtcgagttgcgttattgagcccaatgtataatgaaaatattaag 
aaaaaaattgatacatgtatgaataaaggtcaattgacgatatggcatgccattggttta 
gaagaagcagattttattttaccaactaatggtaaaagagctaactttttaacacctaac 
cgtgaattaaatcaactgattaaagaagataatattgaatatttctataaagaatttaac 
ggtggacatcattggaaatcatggaaaccattgctaggagatattctcttacaattttta 

20 ggtgatccaataaatggaaaatatgtttaa 

Sequence 188 

MGKEILPYIDATFPTYKVGNTRLLIGDSLAGSIALMTAMTYPTIFS RVALLSPMYNENIK 
KKIDTCMNKGQLTIWHAIGLEEADFILPTNGKRANFLTPNRELNQLIKEDNIEYFYKEFN 
25 GGHHWKSWKPLLGDILLQFLGDPINGKYV* 

Sequence 189 

Contig_04 5 4_pos_3509_3000, 

putative peptide of unknown function 

30 atgattttaggattagcattggttccgtcaaagtcatttcaagatgaggtgaatgcttat 
cgcaagcgatatgacaatcattatgctcaaataatgcctcatatcacgattaaacctcaa 
tttgaaatcgatgatcatgattttaatttaattaaaaatgaagtgaaaaatcgaatttct 
agtattaaaccagtagaagtacatgctacaaaggcatctaatttcgctccaa tcagtaat 
gttatatacttcaaagttgctaaaacagagtcattagatcaattatttaatcaatttaat 

35 acagaagatttttacggtacagctgaacatccttttgtaccacattttacaattgcccaa 
ggtctaacaagtcaagaatttgaagatatatatggtcaagtaaaattagcaggggtagac 
catagagaaataattgaagaactatcgttacttcaatatagtgaagaagaggacaaatgg 
actattattgaaacttttacattaggataa 

40 Sequence 190 

MILGLALVPSKSFQDEVNAYRKRYDNHYAQIMPHITIKPQFEIDDHDFNLIKNEVKNRIS 
SIKPVEVHATKASNFAPISNVIYFKVAKTESLDQLFNQFNTEDFYGTAEHPFVPHFTIAQ 
GLTSQEFEDIYGQVKLAGVDHREIIEELSLLQYSEEEDKWTIIETFTLG* 

45 Sequence 191 

Contig_04 55_pos_5713_5009, 

is similar to (with p-value 2.0e-38) 

>sp: sp| P17166|TRPA_LACCA TRYPTOPHAN SYNTHASE ALPHA CHAIN (EC 
4.2.1.20). >pir:pir IS42347 | JS0344 tryptophan synthase • (EC 4 

50 .2.1.20) alpha chain - Lactobacillus casei >gp: gp | DO 04 96 | LBA 
TRP_6 Lactobacillus casei DNA, trp operon (trpD, trpC, trpF, 

trpB, trpA) , complete cds. NID: g216754. 
atgggtgatttaaattttattcatcatttaaaaacattaactgagaatggagcagacatt 
gttgaaattggtgtgccattttctgatcctgttgcagatggacctataatcatgaaagca 

55 gggcgcaacgctattgacgagggttcaaacattaaattcatttttgatgaattaataaaa 
aataaaaatactatttcatctaagtatgtattaatgacttattataatattctaagtgct 
tatggagaagaattatttttggataagtgtgatgaagctggtgtttatggtttaattatt 
ccagatttaccttacgaacttacaaaaaagtttaaaaaagatttttatcatcattctgtt 
aaaataatatcgttaattgccatgaccgcaagtgatgctaggattatgcaaa ttgcaaag 
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aactcagaaggatttatttacacggtaacaatgaatgccacaacaggtaacagtggggag 
ttccatccagatttaaagagaaaaattgaatatataaaaaaagtttcaaaaattcctgtg 
gttgctggatttggtatcaaaaatcctgaacatgttaaagatatagcgtccgttgcagat 
ggtattgtaattggtagtgaaattgtaaaacgtattgaaatagattcaagaaaagaattt 
5 atcacttatatcaaatcaataagaactacgttgaattctttataa 

Sequence 192 

MGDLNFIHHLKTLTENGADIVEIGVPFSDPVADGPIIMKAGRNAIDEGSNIKFIFDELIK 
NKNTISSKYVLMTYYNILSAYGEELFLDKCDEAGVYGLII PDLPYELTKKFKKDFYHHSV 
10 KIISLIAMTASDARIMQIAKNSEGFI YTVTMNATTGNSGEFHPDLKRKIEYIKKVSKI PV 
VAGFGIKNPEHVKDIASVADGIVIGSEIVKRIEIDSRKEFITYIKSIRTTLNSL+ 

Sequence 193 

Contig_0455_pos_3811_2 633, 
15 is similar to (with p-value 0.0e+00) 

>gp:gp|U23713[SEU23713_l Staphylococcus epidermidis factor e 
ssential for methicillin resistance FEMA (femA) gene, comple 
te cds. NID: gl815617. 

atggaaggtaattacgaattaaaggttgctgaaggtaccgagtcacatttagttggaatt 

20 aaaaataatgataacgaagtgattgcagcttgtttattaacagctgttcctgtaatgaaa 
atatttaaatatttttattccaatcgcggtccagtaatagattataataataaagagctt 
gtacattttttctttaatgaattgagtaaatatgtaaaaaaatataattgtttatattta 
agagttgacccataccttccatatcaatatttaaatcatgagggagaaataactggaaat 
gcaggtcatgattggatttttgatgaattagagagtttaggatataaacacgaaggattc 

25 cacaaaggatttgatcctgtattacaaatccgatatcattctgttctaaatttagcaaac 
aaaagtgctaatgatgttttaaaaaacatggatggtttaagaaagcgtaatactaaaaaa 
gttaagaaaaatggagttaaagtccgctttttatctgaagaagagttacctatatttagg 
tcatttatggaggatacctctgaaactaaagattttgcagatagagaagatagtttttat 
tacaacagattcaaacattataaagaccgtgttttagtaccactagcctatattaacttt 

30 gatgagtatatagaggaactaaataatgaaagaaatgtgcttaataaagattataataaa 
get ttaaaagacattgagaaacgtccagagaataaaaaagcacataacaaaaaggaaaat 
ttagaacaacaactcgatgcaaatcagcaaaaaattaatgaagctaaaaacttaaaacaa 
gaacatggcaatgaattacccatctctgctggcttctttataattaatccgtttgaagta 
gtttactacgctggtggaacttcaaatcgttatcgccattttgcagggagctatgcggtt 

35 caatggaagatgattaactatgcaattgaacatggtattaatcggtataatttctatggt 
attagtggtgactttagtgaagatgctgaagatgctggcgtagttaagtttaaaaagggc 
tatgatgccgatgttatagaatacgttggtgactttattaaacctattaataaaccaatg 
tataacatttatagaacacttaaaaaactaaagaaatag 

40 

Sequence 194 

MEGNYELKVAEGTESHLVGIKNNDNEVIAACLLTAVPVMKI FKYFYSNRGPVIDYNNKEL 
VHFFFNELSKYVKKYNCLYLRVDPYLPYQYLNHEGEITGNAGHDWIFDELESLGYKHEGF 
45 HKGFDPVLQIRYHSVLNLANKSANDVLKNMDGLRKRNTKKVKKNGVKVRFLSEEELPIFR 
SFMEDTSETKDFADREDSFYYNRFKHYKDRVLVPLAYINFDEYIEELNNERNVLNKDYNK 
ALKDIEKRPENKKAHNKKENLEQQLDANQQKINEAKNLKQEHGNELPISAGFFIINPFEV 
VYYAGGTSNRYRHFAGSYAVQWKMINYAIEHGINRYNFYGISGDFSEDAEDAGVVKFKKG 
YDADVIEYVGDFIKPINKPMYNI YRTLKKLKK* 

50 

Sequence 195 

Contig_0455_pos_2607_1354, 

is similar to (with p-value 0.0e+00) 

>gp:gp|U23714 |SEU23714_1 Staphylococcus epidermidis factor e 
55 ssential for methicillin resistance FEMB (femB) gene, comple 
te cds. NID: gl815619. 

atgaaatttacagagttaacagttaaagaatttgaaaactttgtacaaaatccatcatta 
gaaagtcattatttccaagtgaaggaaaatattgctacacgtgaatcagatgggtttcaa 
gtagtgttattaggtgtaaaagacgacgacaatagagtgatagcagctagcctgttttct 
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aaaatccctacaatgggcagttatgtgtattattccaatagaggccctgtaatggactat 
tcagatttaggtttagtggatttttatttaaaagagcttgataaatatttacatcaacat 
caatgcttatatgtaaaattagatccttactggttgtatcaagtttatgataaagatatt 
aatcctttaacagaaaaaaatgatgctttagtaaatctatttaaatcacatggttatgat 
5 catcacggatttacaacccaatatgattcttccagccaagttagatggatgggggtatta 
gatttagaaggcaaaacccctgcatctctaaggaaagagtttgatagtcaaagaaaacga 
aatattaataaagcgataaactacggtgtgaaagttagatttcttagtaaggatgaattt 
gatttattcttagacttataccgagagactgaagctagaactggatttgcttctaaaact 
gacgattatttctataactttatagagcattatggcgataaagtattagttcctttagct 

10 tacatagatttaaatgaatatatacaacatttgcaagaatcactaaatgataaagaaaat 
cgacgtgatgatatgatggcgaaagaaaataaaacagataaacagttaaagaaaatagct 
gagttagataaacaaattgatcacgataaaaaagaattgcttcaagctagtgaattacgt 
caaacagatggcgaaattttaaatttagcttcaggagtatactttgctaatgcatatgaa 
gtgaactatttctctggagggtcttcagaaaaatataatcaatatatgggaccatatgca 

15 atgcattggcacatgattaattattgttttgataacggttatgatagatataatttctat 
ggcttatcaggtgattttactgaaaacagtgaagactatggtgtttatcgctttaagaga 
ggttttaatgttaggattgaggaattaatcggtgatttctataaaccaatcaataaagtg 
aaatattggttattcaatacattagatcgcatacgtaataaattgaaaaagtaa 

20 Sequence 196 

MKFTELTVKEFENFVQNPSLESHYFQVKENIATRESDG FQVVLLGVKDDDNJRVIAASLFS 
KIPTMGSYVYYSNRGPVMDYSDLGLVDFYLKELDKYLHQHQCLYVKLDPYWLYQVYDKDI 
NPLTEKNDALVNLFKSHGYDHHGFTTQYDSSSQVRWMGVLDLEGKTPASLRKEFDSQRKR 
NINE<AINYGVKVRFLSKDEFDLFLDLYRETEARTGFASKTDDYFYNFIEHYGDKVLVPLA 

25 YIDLNEYIQHLQESLNDKENRRDDMMAKENKTDKQLKKIAELDKQIDHDKKELLQASELR 
QTDGEILNLASGVYFANAYEVNYFSGGSSEKYNQYMGPYAMHWHiyilNYCFDNGYDRYNFY 
GLSGDFTENSEDYGVYRFKRGFNVRIEELIGDFYKPINKVKYWLFNTLDRIRNKLKK* 

Sequence 197 
30 Contig_04 56_pos_6598_8601, 

is similar to (with p~value 1.0e-81) 

>gp:gp! AE001272 | AE001272_20 Lactococcus lactis DPC3147 plasm 
id pMRCOl, complete piasmid sequence. NID: g3582195. 
gtgacaaatgcaacgcctgaacaatataacccttcatataaagaatggaatttagaagac 

35 ttacctatcattcctaagaaaatgaaaacagtagtgattagtaaaacaaatagacaattt 
aaaattgtaaaatctttaattttagataaaaatgttaaagaaattattatagcaacagat 
gctggacgagaaggtgaactagtagctcgtcttattttagataaagtaggtaataaaaaa 
ccaatcaagcgtttgtggattagttcggttacaaaaaaagccatacaagaaggatttaaa 
cagttaaaaaatggaaacgcgtatcaaaatttatatgaagcagcacttgcacgaagtgaa 

40 gcagattggatagtagggattaatgcaacacgtgcactaacgacaaaatatgatgcacaa 
ttatcattaggtcgtgtacaaactccaacaatacaaatagttaaatcaagacaagatgag 
attaactattttaaaccagaaaaatattacacgttatccattaatgttgatggttacgat 
ttaaaccttaagcaacaaaagcgatataaagataaaaaagaattagaattgattgaacat 
aaaattaaacatcaagaaggaaagatattagaagttaaaggaaaaaataagaaatcttac 

45 gcgcaacctttatttaatttaacagatttacaacaagaggcatataaacattacaagatg 
gggccaaaggagacactaaatacattacaacatttatatgagagacataagttagtaacc 
tatccccgtacagattctaattatttaacagatgatatggtcgatacaattcaagaacgg 
ttaagagcaattttagctacagattataaatctcatgttcgagatttaatttctgagtcc 
ttttcttctaaaatgcatatttttaataatcaaaaagtttcagatcatcatgcgattatt 

50 cccacagaggttagaccatctattgaacaattgagtcaacgagagtttaaaatttatatg 
ctaatcgcagaaagatttttagaaaatttaatgaatccttatttatatgaagttttaaca 
atccatgcacaactgaaagattacaattttgttttaaaagagataatacctaaacaatta 
ggatataaagctttaaaagatcaaacctcttcgcatactttaacgcat tcttttaaagaa 
ggtcagt tatttaaagtacatcgtattgagat t catgaacatgaaacaaaggcaccggaa 

55 tattttaacgaaggttcattacttaaagccatggagaatccacaaaatcatattgatttg 
aatgataaaaagtatgcaaaaacactcaaacattcgggggggattggaactgtagcaact 
agggctgatattatagaaaagttatttaacatgaatgctttagagtcgcgagatggcaaa 
attaaagttacatcaaaaggaaaacaaattttagaattgtctccaagtgaattaacctca 
cctatactaacagcccaatgggaagaaaaattaatgcttatcgaaaaggggaaatataat 



52 



WO 01/34809 



PCT/US00/30782 



tctcagaaattcatacaggaaatgaaaaactttacatttaaagtagtaaataaaattaaa 
agcagtgagcaaaaatataaacatgataatttaacaacaaccgagtgcccaacatgtggt 
aagtttatgataaaagtcaaaactaaaaatggacagatgcttgtatgtcaagatcccaaa 
tgtaaaactaagaaaaatattcaacgcaagactaatgcacgttgcccttattgtaagaaa 
5 aaaatgactttattcggtaaagggaaagaagctgtttatagatgtgtatgtggccacaca 
gaaactcaatcacaaatggacaaaagaatgagagataaaacgaatggtaaagtttcacgt 
aaagaaatgaaaaaatatataaataaaaaagaagaaatcgacaataatccattcaaagat 
gctctgaaaaatctcaaattgtag 

10 Sequence 198 

VTNATPEQYNPSYKEWNLEDLPI IPKKMKTVVISKTNRQFKIVKSLILDKNVKEIIIATD 
AGREGELVARLILDKVGNKKPIKRLWISSVTKKAIQEGFKQLKNGNAYQNLYEAALARSE 
ADWIVGINATRALTTKYDAQLSLGRVQTPTIQIVKSRQDEINYFKPEKYYTLSINVDGYD 
LNLKQQKRYKDKKELELIEHKIKHQEGKILEVKGKNKKSYAQPLFNLTDLQQEAYKHYKM 

15 GPKETLNTLQHLYERHKLVTYPRTDSNYLTDDMVDTIQERLRAILATDYKSHVRDLISES 
FSSKMHIFNNQKVSDHHAIIPTEVRPSIEQLSQREFKIYMLIAERFLENLMNPYLYEVLT 
IHAQLKDYNFVLKEIIPKQLGYKALKDQTSSHTLTHSFKEGQLFKVHRIEIHEHETKAPE 
YFNEGSLLKAMENPQNHIDLNDKKYAKTLKHSGGIGTVATRADIIEKLFNMNALESRDGK 
IKVTSKGKQILELSPSELTSPILTAQWEEKLMLIEKGKYNSQKFIQEMKNFTFKVVNKIK 

20 SSEQKYKHDNLTTTECPTCGKFMIKVKTKNGQMLVCQDPKCKTKKNIQRKTNARCPYCKK 
KMTLFGKGKEAVYRCVCGHTETQSQMDKRMRDKTNGKVSRKEMKKYINKKEEIDNNPFKD 
ALKNLKL* 

Sequence 199 
25 Contig_04 56_pos_6881_654 9, 

is similar to (with p-value 1.0e-19) 

>gp:gp| AE001272 I AE00127220 Lactococcus lactis DPC3147 plasm 
id pMRCOl, complete plasmid sequence. NID: g3582195. 
atggctttttttgtaaccgaactaatccacaaacgcttgattggttttttattacctact 
30 ttatctaaaataagacgagctactagttcaccttctcgtccagcatctgttgctataata 
atttctttaacatttttatctaaaattaaagattttacaattttaaattgtctatttgtt 
ttactaatcactactgttttcattttcttaggaatgataggtaagtcttctaaattccat 
tctttatatgaagggttatattgttcaggcgttgcatttgtcacaagatgccccaatgcc 
caagttactatatactgtttcccttctatataa 

35 



Sequence 200 

MAFFVTELIHKRLIGFLLPTLSKIRRATSSPSRPASVAIIISLTFLSKIKDFTILNCLFV 
40 LLITTVFIFLGMIGKSSKFHSLYEGLYCSGVAFVTRCPNAQVTIYCFPSI* 

Sequence 201 

Contig_04 56_pos_5929_5312, 
is similar to (with p-value 4.0e-77) 
45 >gp:gp| Y14 04 3 I SXY14 04 3_1 Staphylococcus xylosus gltA, gdh ge 
nes. NID: g2226000. 

gtgcatttaataggtgtatctaaaacgatgcctatttcaacgggtatgcaacttgtcggt 
accactctatttagcgctattttcttaggtgaatggagcacgattgttcaagtagtgatg 
ggacttatagcaatgatcttattggttgtaggtatttctttaacatcacttaaagccaaa 

50 agcgaaggcaaatccgataacccagaatttaaaaaagcaatgggaatattacttctatca 
acaatcggttacgtaggttatgtcgttcttggagatatttttggagtaagtggtacagat 
gctctcttcttccaatcaattggtatggcaattggaggattaatcctttcaatgaatcat 
aatacttcaattaaatctactgctctaaatcttataccaggtgttatctggggtatcggt 
aacttatttatgttctattcacaacctaaagttggtgtagcaactagtttctcattatca 

55 caactgcttgttattgtttcaactttagggggtat ctt tattctaggggagaaaaaagat 
cgtcgccaaatgattggtatttggtcaggtattatcgttatagttatagcttcaatcat t 
ttaggcaacttaaaatag 

Sequence 202 
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VHLIGVSKTMPISTGMQLVGTTLFSAIFLGEWSTIVQVVMGLIAMILLVVGISLTSLKAK 
SEGKSDNPEFKKAMGILLLSTIGYVGYVVLGDI FGVSGTDALFFQSIGMAIGGLILSMNH 
NTSIKSTALNLIPGVIWGIGNLFMFYSQPKVGVATSFSLSQLLVIVSTLGGIFILGEKKD 
RRQMIGIWSGIIVIVIASIILGNLK* 

5 

Sequence 203 

Contig_04 56_pos_528 4_4 4 93, 

is similar to (with p-value 0.0e+00) 

>gp:gp| Y14 04 3 I SXY14 043_2 Staphylococcus xylosus gltA, gdh ge 

10 nes. NID: g2226000. 

gtgtttgaagaattagaaaataaagtggttcttattactggagctgccactggaattggc 
aaatctattgcggaaaattttggtaaagctaaggccaaggttgttataaattaccgttct 
gatcgacatcatgatgaaattgaggaaattaaacaaactgttgctaaatttggtggtcaa 
acattggtggttcaaggtgatgtttcaattgaagaagatattaaacgaatgattgaaaca 

15 acaattaatcactttggaactttagacattataattaataatgctggattcgaaaattca 
atcccaactcatgaaatgtcgattgacgactggcaaaaagttattgacataaacttaact 
ggcgcctttgtgggttcaagagaagccatcaatcaatttttaaaggaaaacaagaaaggt 
actattattaacatttcgagtgttcatgacactattccatggcctaattatgtacactat 
gccgcaagtaaaggtggcttaaaattaatgatggaaacaatgtcaatggaatatgcccaa 

20 tacggtattcgtattaataatatatctcctggggcaattgttactgaacacactgaagaa 
aaattttctgacccaacgacgcgtgaagaaacaataaaaatgatacctgcacgtgaaatt 
ggaaatgctcaagatgtagctaatgcagtactattcctatcttcagatcttgcaagttat 
atacacggtacaacattgtacgttgatggtggcatgatgaactatccagcatttatgggt 
ggtaaaggttaa 

25 

Sequence 204 

VFEELENKVVLITGAATGIGKSIAENFGKAKAKVVINYRSDRHHDEIEEIKQTVAKFGGQ 
TLVVQGDVSIEEDIKRMIETTINHFGTLDI I INNAGFENSIPTHEMSIDDWQKVIDINLT 
GAFVGSREAINQFLKENKKGTIINISSVHDTIPWPNYVHYAASKGGLKLMMETMSMEYAQ 
30 YGIRINNISPGAIVTEHTEEKFSDPTTREETIKMI PAREIGNAQDVANAVLFLSSDLASY 
IHGTTLYVDGGMMNYPAFMGGKG* 



35 Sequence 205 

Contig_04 56_pos_2714_1809, 

putative peptide of unknown function 

atgagtagtactcgtaaaccaaaattagattatgaggaacaaattaaaaagttgaaatca 
t taggaattctattcaatgaaataacagaagaagaagctaaagaaatattaaaaaataac 

40 acttatttttttaaattgatatcttttcgtaaaaatataaaaaaggatagtagtggaaat 
tataattttgagttttctgcactttctgattttgctactttagatatgcgattaagatat 
actttattacctatgtgtttggatatagaacattcactaaaaacagatattcttaaaaag 
attactgatgatgtaaacgaagacggatatacaattgttcaagattttataaacaatcat 
aatggagatttagaaaaaatcttttctagcgtgattaaaagagatggtacagttataccg 

45 agttttcaaaaatattatgatgatcctccaatatgggtatgcttagaattaatgactttt 
ggccaattttcagcatttgtagaattttattctgaaagaacaaatgactctgagttacgt 
aaggctggtaaatttattaaatttgctaaaaacattaggaataaatgtgctcatagccaa 
ccaattttattaaatttaaatccacgcaaaaact ttaccgttgaaagagaattaaaaaag 
ataggtagaaaacaaagactgtctgataaaaaccttaaagtattagcaataattgatatt 

50 cttgcattattagttttacattctaaatattgtagtaaaggtataaaagataatcgaaaa 
aatgatttattaact tttaaacaacgtaaaaatagatattttcatcattatcgaaatgtt 
ccttctctttcatttttctttctatcacttaacaaaatgattgactattatgttcaaaac 
aattaa 

55 Sequence 206 

MSSTRKPKLDYEEQIKKLKSLGILFNEITEEEAKEILKNNTYFFKLISFRKNIKKDSSGN 
YNFE FSALSDFATLDMRLRYTLLPMCLDIEHSLKTDILKKITDDVNEDGYTIVQDFINNH 
NGDLEKI FSSVI KRDGTVI PSFQKYYDDPPI WVCLELMTFGQFSAFVEFYSERTNDSELR 
KAGKFIKFAKNIRNKCAHSQPILLNLNPRKNFTVERELKKIGRKQRLSDKNLKVLAIIDI 
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LALLVLHSKYCSKGIKDNRKNDLLTFKQRKNRYFHHYRNVPSLSFFFLSLNKMIDYYVQN 
N* 

Sequence 207 
5 Cont ig_0 4 5 6_pos JD_5 3 9 , 

is similar to (with p-value 3.0e-49) 

>Sp : sp | P39755 | NDHF_BACSU NADH DEHYDROGENASE SUBUNIT 5 (EC 1. 
6.5.3) (NADH-UBIQUINONE OXIDOREDUCTASE CHAIN 5). >gp:gp|U283 
23 |BSU28323_1 Bacillus subtilis NADH dehydrogenase subunit 5 

10 (ndhF) gene, complete cds . NID: g903586. >gp : gp | Z99104 | BSUB 
0001_183 Bacillus subtilis complete genome (section 1 of 21} 
: from 1 to 213080. NID: g2632267. >gp: gp | Z 99 105 I BSUB0002_11 
Bacillus subtilis complete genome (section 2 of 21) : from 1 
94651 to 415810. NID: g2632457. >gp: gp| AB006424 |AB006424_9 B 

15 acillus subtilis genomic DNA, 70 kb region between 17 and 23 
degree. NID: g3599592. 
atgcattaccgtaaatattttccgttttttacattaattactgcatttgcttcattggca 
tggttaagtggagacttaaggttaatgaccatgttttggggtgcaacattatttgtgtta 
acacggctcattaaagttaacaaattatggaaggtgcctagggaagcagcaagaatt tea 

20 gcttggtcatttatattggcatggttgtcgttattgattgctgtcattttattgtatatc 
gctacaggagattggtatatttattcgaatatgtcagatgataatgcaatcaattatgga 
atgcgtctctgtatcaatttacttattgttttagctgtgattattccggcggcacaattt 
ccatttcaaggctggcttattgaatctgtagctgcgcctacgccagtttcagctattatg 
cacgctggtattgttaatgctggtggcgttattcttacacgcttttctccggtatttaat 

25 gacgaaatagccatttcactgttattaattattgcaagtatttcagtattgttgggttc 

Sequence 208 

MHYRKYFPFFTLITAFASLAWLSGDLRLMTMFWGATLFVLTRLIKVNKLWKVPREAARIS 
AWSFILAWLSLLIAVILLYIATGDWYI YSNMSDDNAINYGMRLCINLLIVLAVI IPAAQF 
30 PFQGWLI ESVAAPTPVSAIMHAGIVNAGGVILTRFSPVFNDEIAISLLLIIASISVLLGS 



Sequence 209 

Contig_04 57_pos_1064_2419, 

35 is similar to (with p-value 2.0e-79) 

>sp : sp | P2354 5 I PHOR_BACSU ALKALINE PHOSPHATASE SYNTHESIS SENS 
OR PROTEIN PHOR (EC 2.7.3.-). >pir : pir | A27 650 | A27 650 regulat 
ory protein phoR - Bacillus subtilis >gp : gp | AF008220 | AF00822 
0_180 Bacillus subtilis rrnB-dnaB genomic region. NID: g2293 

40 135. >gp:gp|M2354 9|BACPHORP_2 Bacillus subtilis alkaline pho 
sphatase regulatory protein (phoP gene, 3' end and phoR gene 
, complete cds). NID: gl43329. >gp:gpl Z99118 | BSUB0015_175 Ba 
cillus subtilis complete genome (section 15 of 21) : from 279 
5131 to 3013540. NID: g2635200. 

45 atgcgttacacctataagaatacaatagatgataaaacaatatacataagtggaattaat 
aatgaaattattgatttacaaaaagatttatggaaatacttgtctattgttggagtcatt 
gtattatttacggtttatttagcaagtagaagtatcaatcgaacatatattagacctatc 
aatgaagtaacttatgctacatcacttctagcagatggatattaccatgttcgtgttcca 
gaaagtaatgtgaaggaaactagggcattatttgtgactacaaatgacttagcacgacga 

50 ttgcaaaaattaaacaatagtcaaaaaattcaatccaatagattaaaaactaccttagaa 
aatataccgagttcagtactgatgattgataaacatggagaaattgtagttgctaatcat 
gcttattatcaggtgtttaaccctgatcaaatggtagaaaataaaagttacattggtttc 
atagatgatagtattgaaaaattaattattgaaagttttagaactgaaaaagttatctat 
gaacaattagaagttgctattaataacgtacatactaaatatttcgatgtatcttgtatc 

55 cccattttaactaaatctaaaaaaaatttacaaggtatggtggttgtgcttcatgacatt 
actaatttgcagaaattagaaaaccttagaagggaatttgttgcaaatgtgtcacatgaa 
ctaaaaacaccgattacttcaatcaaaggttttgcagaaactctgattgaaggtgctaaa 
aatgatgaacaatcgcttgatatgtttttaaatattattt taaaagaatctaatagaata 
gagtcattggttacagacttattagatttatcacatatagaacagcaaaaagaacttgaa 
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ataaattacatgaatttatctgaattagctattaatataatagataatttgcaaacacaa 
gcatacaataagagaatcaaaatacaatctgaaattgaaaaagatgtcatcattgaggca 
catgaaaataaaatagcgcaagttattactaatttgctatcaaatgctataaattattct 
tcagaagataataaggtaatagtaagagtatatagaaatgacaataaagtttatttagag 
5 attcaagattatggtattggtataagtgaaacagatcaaaagcgtatatttgaacgtttc 
tatcgtgtagataaagcgagaagtagagattcaggtggtacaggacttggtctgtctata 
acaaaacatattgttgaagcacataatggtagaatagacgtgaaaagtgcacctggcaaa 
ggttcgatattcaaagttctatttaatgataattaa 

10 Sequence 210 

MRYTYKNTIDDKTI YISGINNEI IDLQKDLWKYLSIVGVIVLFTVYLASRSINRTYIRPI 
NEVTYATSLLADGYYHVRVPESNVKETRALFVTTNDLARRLQKLNNSQKIQSNRLKTTLE 
NIPSSVLMI DKHGEIVVANHAYYQVFNPDQMVENKSYIGFIDDSIEKLIIESFRTEKVIY 
EQLEVAINNVHTKYFDVSCI PILTKSKKNLQGMVVVLHDITNLQKLENLRREFVANVSHE 

15 LKTPITSIKGFAETLIEGAKNDEQSLDMFLNIILKESNRIESLVTDLLDLSHIEQQKELE 
INYMNLSELAINIIDNLQTQAYNKRIKIQSEIEKDVIIEAHENKIAQVITNLLSNAINYS 
SEDNKVI VRVYRNDNKVYLEIQDYGIGISETDQKRI FERFYRVDKARSRDSGGTGLGLSI 
TKHIVEAHNGRIDVKSAPGKGSI FKVLFNDN* 

20 Sequence 211 

Cont ig_04 57_pos_324 8_4 04 8 , 

is similar to (with p-value 2.0e-37) 

>sp:splPl3252|DP01_STRPN DNA POLYMERASE I (EC 2.7.7.7) (POL 
I). >pir:pir {A32949! A32949 DNA-directed DNA polymerase (EC 2 
25 .7.7.7) - Streptococcus pneumoniae >gp: gp I J04479 { STRPOLA_l S 
.pneumoniae DNA polymerase I (polA) gene, complete cds . NID: 
gl537 64. 

atgaaaggtctaatgggggatacctctgacaatattcctggcgttgctggtgtcggcgaa 
aagacggctattaaattacttaatcaatttgagtcagtagaaggggtctatgaacatatt 

30 gaggaggtcactgcaaaaaaattaaaagaaaaactcatcaatagtaaagatgatgcctta 
atgagtaaagatttagcaacaatcaatgttcacagtccgattgaagtatcattagaagat 
acaaaattaactctacaagacgacactacagaaaaaattgaactatttaaaaagctagaa 
tttaaacaactattagcagatatagacacatcctctacgaatgaagaagtcatagataaa 
acttttgaaattgagcaagactttcaaaatgtagatttgaatgatttaaacgaagcggta 

35 atacattttgaactcgaaggcactaattatcttaaagacactattctcaagtttggtttt 
tatacaaatcatcaacatgtagtgataaatgctgaggatgtaaaggat tataaacattta 
gttcaatggcttgaagataaaaatacaactaaaattgtctatgatgcaaaaaaaacttat 
gtatctgctcatcgattagggattaatatagaaaatattgaatttgatgttatgttagca 
agctatattattgacccatcacgttctattgatgacgttaaatctgtggtaagtttatat 

40 ggacaaaattatgtaaaagataatattacaatatttgggaaaggtaagaaacatcatata 
cctgaatatccctcattttaa 

Sequence 212 

MKGLMGDTSDNI PGVAGVGEKTAI KLLNQFESVEGVYEH I EEVTAKKLKEKLI NSKDDAL 
45 MSKDLATINVHSPIEVSLEDTKLTLQDDTTEKIELFKKLEFKQLLADIDTSSTNEEVIDK 
TFEIEQDFQNVDLNDLNEAVIHFELEGTNYLKDTILKFGFYTNHQHVVINAEDVKDYKHL 
VQWLEDKNTTKIVYDAKKTYVSAHRLGINIENIEFDVMLASYIIDPSRSIDDVKSVVSLY 
GQNYVKDNITI FGKGKKHHI PEYPSF* 

50 Sequence 213 

Contig_04 57_jx>s__4 381_5253, 

is similar to (with p-value 0.0e+00) 

>gp:gp|U02682|HIU02682_l Haemophilus influenzae KW20 catalas 
e (hktE) gene, complete cds. NID: g409459. 
55 gtgattcctgaacgtcgtatgcatgcgaaaggttcaggtgcatttggtacgttcacagtt 
acaaatgacatcacacaatatacaaatgcgaaaatattctcagaagtcggaaaacaaaca 
gagatgtttgcacgt t tttctactgtttcaggagaacgtggagcagcagatttagaacgt 
gatatacgtgggtttgccttgaaattctacactgaagatggaaactgggatttagtaggt 
aacaatacgccagttttcttctttagagatcctaaactatttattagtttgaatcgtgct 
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gtaaaacgagatccacgtacaaatatgagaagtgcacaaaataactgggacttttggaca 
ggtctaccggaagcattgcatcaagtgacaatattaatgtcagatagaggtatgccaaaa 
ggattccgaaatatgcatggattcggttctcatacgtattctatgtataatgataaaggt 
gaacgtgtatgggtaaaatatcatttccgtacacaacaaggaattgaaaactatactgac 
5 gaggaagcagctaaaattgtaggtatggatagagattcttcacagagggatttatataat 
gctatcgaaaatggagattatccaaaatggaaaatgtacattcaagttatgacagaggaa 
caagctaaaaatcatccagacaatccttttgatttaacaaaggtatggtataaaaaagac 
tatccactgattgaagtgggagaatttgaattgaatcgtaatcctgagaattattttctt 
gatgtagagcaggcagcgtttacgcctacaaatattgttcctgggttagattattcacca 
10 gataaaatgctacaaggacgtttattctcataa 

Sequence 214 

VIPERRMHAKGSGAFGTFTVTNDITQYTNAKIFSEVGKQTEMFARFSTVSGERGAADLER 
DIRGFALKFYTEDGNWDLVGNNTPVFFFRDPKLFISLNRAVKRDPRTNMRSAQNNWDFWT 
15 GLPEALHQVTILMSDRGMPKGFRNMHGFGSHTYSMYNDKGERVWVKYHFRTQQGIENVTD 
EEAAKIVGMDRDSSQRDLYNAIENGDYPKWKMYIQVMTEEQAKNHPDNPFDLTKVWYKKD 
YPLIEVGEFELNRNPENYFLDVEQAAFTPTNIVPGLDYSPDKMLQGRLFS* 

Sequence 215 
20 Contig_04 57j?os_6680_5622, 

is similar to (with p-value 0.0e+00) 

>gp:gp| AF090142 I AF090142_1 Staphylococcus epidermidis lipase 

precursor (gehD) gene, complete cds . NID: g3789931. 
gtgatttttttgaaaaataataatgaaacaagaagatttagcattaggaagtacacggtg 

25 ggagtcgtgtcaatcattactgggattacaatatttgtcagtggtcagcatgctcaagct 
gctgaaatgacacaa teat cat cagattttaacgaacagtcacaacaaacagaacaagtt 
gaacacaaagaagatacaactcatttatcatacgaattgaatcaagagggtgacacagct 
agccaatcaaagactaatcaagagaaccaatctgatgaaaatgtacaaaaaaagaataat 
caaactcaacaagattcaacacaaacgtcaccattaaatgaccaagaacaaactttaaag 

30 gggcaacaatcaaaagacaatcatgttaccccaaattcacgtcaggatacatatccaaaa 
ggccaaaatcaagatgataaaggcaaacaacagtttaaagataatcaacactcacaaaca 
gaacatcaacctaatactcaaaaccaaaataatgatcaagattcatcagataaaaagcaa 
cacccatctgatcaaactcaagccccatcttcaaaaggaacacaacctaaacaatcacag 
tctataggagatagagataaaacagtaaaacaaccatcttctaaagtacacaaaataggt 

35 aatacaaaaactgataaaacagttaaaacaaatcaaaaaaagcaaacatcattaacttca 
ccacgcgttgtgaaatcaaaacaaactaaacatatcaatcaacttactgcgcaagctcaa 
tataaaaatcaatatccagtcgtgtttgtacatggatttgtaggtttagtcggtgaagat 
tcattcagcatgtacccaaattattggggtggtactaaatataacgtgaaacaagaactt 
acaaaattaggttaccgagttcacgaagccaatgtaggagcatttagcagcggtgaagtt 

40 aatttacgattggaccgcgtgtaccatttcaattcgtcttggctcaaaaattcatcgtcg 
tacttagtattatcactgatattctcatgtttatcataa 

Sequence 216 . 

VI FLKNNNETRRFS I RKYTVGWS I ITGIT I FVSGQHAQAAEMTQSSS DFNEQSQQTEQV 
45 EHKEDTTHLSYELNQEGDTASQSKTNQENQSDENVQKKNNQTQQDSTQTSPLNDQEQTLK 
GQQSKDNHVTPNSRQDTYPKGQNQDDKGKQQFKDNQHSQTEHQPNTQNQNNDQDSSDKKQ 
HPSDQTQAPSSKGTQPKQSQSIGDRDKTVKQPSSKVHKIGNTKTDKTVKTNQKKQTSLTS 
PRVVKSKQTKHINQLTAQAQYKNQYPVVFVHGFVGLVGEDSFSMYPNYWGGTKYNVKQEL 
TKLGYRVHEANVGAFSSGEVNLRLDRVYHFNSSWLKNSSSYLVLSLI FSCLS* 

50 

Sequence 217 

ContigJ)4 58_pos_6103_5078, 

is similar to (with p-value 0.0e+00) 

>sp:sp| P4 9787 |ACCC_BACSU BIOTIN CARBOXYLASE (EC 6.3.4.14) (A 
55 SUBUNIT OF ACETYL-COA CARBOXYLASE (EC 6.4.1.2)) (ACC) . >gp: 
gp|U36245|BSU36245_2 Bacillus subtilis biotin carboxyl carri 
er protein (accB) and biotin carboxylase (accC) genes, compl 
ete cds. NID: gl055244 . 

atgggaataaaagatattgctaaagctgaaatgattaaagccaatgtacctgtagtacca 



57 



WO 01/34809 



PCT/US00/30782 



ggaagtgaaggacttattcaaagtatagatgacgctaaaaaaatagctaaaaaaatcggc 
tatccagttatcatcaaagccacagcaggtggtggtggaaaaggtattcgggttgctcgt 
gatgagaaagaacttgaaactggttaccgtatgacacaacaagaagctgaaaccgcgttc 
ggaaatggtggtttatacttagaaaaatttatagaaaactttagacatatagagattcaa 

5 attattggcgatacttatggaaacgttatacatttaggtgaacgtgattgtacaattcaa 
agaagaatgcaaaagctcgttgaagaagcaccctcaccagttttaagtgaagataaacgc 
caagaaatgggtaatgctgcaattagagccgcaaaagctgtaaattatgaaaacgcaggt 
acaattgaatttatatatgatttagatgataaccaattttatttcatggaaatgaataca 
cgtattcaagttgaacacccagtaactgaaatggtaacaggagtagatttagtaaaatta 

10 caactcaaagttgctatgggtgaggcgttaccttttaaacaagaagatatttccattaac 
ggtcacgctattgaatttcgaatcaatgctgaaaatccttacaaaaactttatgccatca 
ccaggcaagattacccaatatcttgctccaggcggttttggagtgagaattgaatcagca 
tgttatactaattatacgataccaccttactatgactccatggtggcaaaacttatagtt 
cacgaacctacacgtgaagaatcaattatgacaggcattcgtgctttaagtgaatatctt 

15 gttttaggtatcgacactacgattccattccacttaagacttctaaataatcatattttt 
agaagtggggaatttaatacaaaattcctagaaaagtataatattatggacgataataac 
caatag 

Sequence 218 

20 MGIKDIAE<AEMIKANVPVVPGSEGLIQSIDDAKKIAKKIGYPVIIKATAGGGGKGIRVAR 
DEKELETGYRMTQQEAETAFGNGGLYLEKFIENFRHIEIQIIGDTYGNVIHLGERDCTIQ 
RRMQKLVEEAPSPVLSEDKRQEMGNAAIRAAKAVNYENAGTIEFIYDLDDNQFYFMEMNT 
RIQVEHPVTEMVTGVDLVKLQLKVAMGEALPFKQEDISINGHAIEFRINAENPYKNFMPS 
PGKITQYLAPGGFGVRIESACYTNYTIPPYYDSMVAKLIVHEPTREESIMTGIRALSEYL 

25 VLGIDTTIPFHLRLLNNHI FRSGEFNTKFLEKYNIMDDNNQ* 

Sequence 219 

Con t i g_0 4 5 8_po s_5 0 6 6_4 704, 

is similar to (with p-value 3.0e-18) 

30 >sp: sp | P54 519 | YQHY_BACSU HYPOTHETICAL 14.7 KD PROTEIN IN ACC 
C-FOLD INTERGENIC REGION. >gp: gp | D84432 I BACJH642_218 Bacillu 
s subtilis DNA, 283 Kb region containing skin element. NID: 
g2627063. >gp: gp I Z99116 | BSUB0013_14 4 Bacillus subtilis compl 
ete genome (section 13 of 21): from 2395261 to 2613730. NID: 

35 g2634723. 

atggtcaatgtagcagattattctcaatctaatttaggaaaaattgaaatagcaccagaa 
gtattatctgttatcgcatccattgcgacatcagaagtagaaggtattacaggccatttt 
gctgaactaaaaaaaacaaatctagagaagattagtcgaaaaaatttaaacagagattta 
aaaatcgaagctaaagaagacggaatatacattgatgtattttgttctttaaaacatggc 
40 gtaaatatttctaaaactgcaaatcaaattcaagaagcaattttcaattcaattacgaca 
atgacagctattgaaccacagcaaattaatattcacatcagaagtatcgtcgcagaaaaa 
taa 

Sequence 220 

45 MVNVADYSQSNLGKIEIAPEVLSVIASIATSEVEGITGHFAELKKTNLEKISRKNLNRDL 
KIEAKEDGIYIDVFCSLKHGVNISKTANQIQEAIFNSITTMTAIEPQQINIHIRSIVAEK 
+ 

Sequence 221 
50 Contig_04 58_pos_4 628_4 239, 

is similar to (with p-value 2.0e-23) 

>sp:sp|P54 520|NUSB_BACSU N UTILIZATION SUBSTANCE PROTEIN B H 
OMOLOG (NUSB PROTEIN) . >gp : gp I D84 4 32 | BACJH64 2_21 9 Bacillus s 
ubtilis DNA, 283 Kb region containing skin element. NID: g26 
55 27063. >gp:gpl Z99116 I BSUB0013_143 Bacillus subtilis complete 
genome (section 13 of 21): from 2395261 to 2613730. NID: g2 
634723. 

atgagtcgtaaagatgcaagagtacaagcttttcaaactttatttcaacttgaaataaaa 
gagacagatttaacaattcaagaagcaattgaatttattaaagatgatcattctgattta 
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gactttgattttatatactggttagttactggagtcaaagatcatcaaatcgttttagac 
gaaacaattaaaccccatttaaaagactggtctatcgatcgtttactgaaatcagatcgt 
attattttaagaatggcaacttttgaaatattgcacagcgacacacctaaaaaagtagtt 
gttaatgaagctgtagaactcacaaaacagtttagtgatgatgatcattataaatttgtt 
5 aatggtgttttaagtaatataaatgattaa 

Sequence 222 

MSRKDARVQAFQTLFQLEIKETDLTIQEAIEFIKDDHSDLDFDFI YWLVTGVKDHQIVLD 
ETIKPHLKDWSIDRLLKSDRI ILRMATFEILHSDTPKKVVVNEAVELTKQFSDDDHYKFV 
10 NGVLSNIND* 

Sequence 223 

Contig_04 58_pos_2587_1805, 

is similar to (with p-value 3.0e-69) 

15 >sp:sp!Q08291|ISPA_BACST GERANYLTRANS TRANSFERASE (EC 2.5.1.1 
0) (FARNESYL-DI PHOSPHATE SYNTHASE) (FPP SYNTHASE). >pir:pir( 
JX0257 } JX0257 geranyltranstransf erase (EC 2.5.1.10) - Bacill 
us stearothermophilus >gp : gp I D132 93 | BACFDPS_1 B. stearotherm 
ophilus DNA for farnesyl diphosphate synthase, complete cds . 

20 NID: g391609. 

atgaaatattcattaaatgctggtggtaaaagaatcagaccagtcatattattattaaca 
ctaaaaatgcttaacaaagattatcaacaaggactaaatagtgctttagcattggaaatg 
at teat act tattctttaat teat gat gatttaccagcaatggataatgacgattaccgt 
agaggaaaattaacaaatcataaagtttatggtgaatggaaagccattcttgctggtgat 

25 gcattattaacaaaagcttttgaattagtttctaatgatactaccattgaagatagtgtg 
aaagtaagtattataaaaagactttcaaaagcaagtggacatttgggaatggtgggtggc 
caagcgcttgatatggaaagtgaagggaagtcaattcgtttagaaactttagaatcaatt 
catgaaactaagacaggcgctttactaaatttttcagttatggctgcggtagacattgct 
caagtagaacaaaatattgctaagaatttagatgaatttagtcatcatttaggaatgatg 

30 tttcaaattaaagatgatttactggatgtgtatggtgatgaatcaaaacttggcaaaaaa 
gtaggcagtgatatagtaaatcataaaagtacttatgtttctttacttggaaaagaagga 
gcagaagaaaagttaaacaatcatcaatatcttgctatgaactgcttaaatcaaatttct 
gatcaatatgatacttctgaattaagtgatattgtagatttattctataacagagaccat 
taa 

35 

Sequence 224 

MKYSLNAGGKRIRPVILLLTLKMLNKDYQQGLNSALALEMIHTYSLIHDDLPAMDNDDYR 
RGKLTNHKVYGEWKAILAGDALLTKAFELVSNDTTIEDSVKVSI IKRLSKASGHLGMVGG 
QALDMESEGKSIRLETLESIHETKTGALLNFSVMAAVDIAQVEQNIAKNLDEFSHHLGMM 
40 FQIKDDLLDVYGDESKLGKKVGSDIVNHKSTYVSLLGKEGAEEKLNNHQYLAMNCLNQIS 
DQYDTSELSDI VDLFYNRDH* 

Sequence 225 

Contig_04 58j?os_0_1022, 

45 is similar to (with p-value 1.0e-74) 

>sp:sp|Pl7894|RECN_BACSU DNA REPAIR PROTEIN RECN (RECOMBINAT 
ION PROTEIN N). >pir : pir | B35128 I B35128 recN homolog - Bacill 
us subtilis >gp:gp| D84432 | BACJH64 2_227 Bacillus subtilis DNA 
, 283 Kb region containing skin element. NID : g2627063. >gp: 

50 gplM30297 1BACRECN_2 B. subtilis recombination and sporulation 
protein (recN, spoIVB) genes , complete cds, arginine hydro 
ximate resistance (ahrC) gene, 3' end. NID: gl43400. >gp:gpl 
Z99116 |BSUB0013_135 Bacillus subtilis complete genome (secti 
on 13 of 21): from 2395261 to 2613730. NID: g2634723. 

55 atgttacaaaccttatcaataaaacaatttgccattattgacgaacttgatataaacttt 
tetgaeggtc taacagttatgagtggtgaaactggctcaggaaaatcta teat tat tgat 
gccattggacagttaatcggtatgagagcttcttctgat tacgtcagacatggtgaaaag 
aaagcaattatcgaaggtatctttgatatagacgagagtaaagacgcaattaatatacta 
gaatcattagctatagatgttgatgaagattttttattagttaaaagagaaattttcagt 
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tctggtaagagtatttgtcgtattaataaccaaactgtcactctacaggacttaagaaaa 
gtgatgcaagaactgcttgatattcatggtcaacatgaaacgcaatctttacttaagcaa 
aaatatcatcttcaactattagatgattatgcagacaatcagtattcagatttacttaat 
caatatcaactttcttataaccaatataaaaataaacgtaaagaattagaggaattagaa 
5 tccgcggaccaggctttattacaacgattagacttaatgaaatttcaattagaggaacta 
accgaagcttcactgaaagaaggcgaagtggaccaacttgaatccgatattaaaagaatt 
caaaactccgaaaaattaaatctagctttaaacaatgcacatcaagttctaactgatgaa 
agtgcaatacccgataggttgtacgaattaagcaactacttgcaaacgattaatgatatc 
gttccagaaaaattcgtaaga'ttaaaagaggacattgatcaattttactatatgctagaa 
10 gatgcaaagcatgaaatttacgacgaaatggctaacactgaattcgatgagcaagtttta 
aatgagtatgaatccagaatgaatttacttaataatttaaaacgtaaatatggtaaggat 
attactgaacttat tgcttatcagagtaaacttgcaaatgaaattgataaaatagTGGAA 
TT 

15 Sequence 226 

MLQTLS I KQFAI I DELDIN FS DGLTVMSGETGSGKS 1 I IDAIGQLI GMRASS D YVRHGEK 
KAIIEGI FDIDESKDAINILESLAIDVDEDFLLVKREIFSSGKSICRINNQTVTLQDLRK 
VMQELLDIHGQHETQSLLKQKYHLQLLDDYADNQYSDLLNQYQLSYNQYKNKRKELEELE 
SADQALLQRLDLMKFQLEELTEASLKEGEVDQLESDIKRIQNSEKLNLALNNAHQVLTDE 

20 SAI PDRLYELSNYLQTINDIVPEKFVRLKEDIDQFYYMLEDAKHEI YDEMANTEFDEQVL 
NEYESRMNLLNNLKRKYGKDITELIAYQSKLANEIDKIVEX 

Sequence 227 

Contig_04 59_pos_802_1155, 

25 putative peptide of unknown function 

atgaaattcttaaatataaattctctagctggtacttgcttttcttcaaatcctgctaaa 
aattcttcattattgtacgctaatttcactctctctactgaaattcctttttcattaacg 
gtaataataccatatacggtgtttgaaccattgttcaaacctactgatccaggattaaaa 
tatactgttgatttatcatcaaacatatgcaacctatggttatgtccaaataaaattaaa 

30 tcggcttctttgtctttaaataattcagaaatagcttgttcgtcatcttttgtaataggt 
gcaaaaggttgttcatcaataggagctgacattt tat cat tttcaatt teat aa 

Sequence 228 

MKFLNINSLAGTCFSSNPAKNSSLLYAN FTLSTEIPFSLTVII PYTVFEPLFKPTDPGLK 
35 YTVDLSSNICNLWLCPNKIKSASLSLNNSEIACSSSFVIGAKGCSSIGADILSFSIS* 

Sequence 229 

Contig_0459_pos_1809_2813, 

putative peptide of unknown function 

40 atgttagctcaactcggtgaatctactacaaaacctataatatctattatttttatttta 
ctcattttagctttattatttatttttgtaggttacccattgataactggtacagtatat 
gcaattcaaaaagctattaataaagagaaagttctttttagtgatttattttttgctttt 
aaaaaaggcaaatatgctaaatcagtaattttagctttaataactttagttttattcatt 
gtaatcgtacttattctagtgctattaaataaattatatagtttagctcttagcccaata 

45 ttaategget tacaacaatcaataagcggctacgacaatccaatgggaattttaattaca 
atacaaattgtgttgttactcataacaggtttcat.ctcatcaattttctattggtttgta 
attatattcattattaattatactaccgcttatacagaagattcatctcgtaaagtaatg 
agtaatttaaaagaaggat ttaaaggtat taaaaaeggtaagaaaact tggtttaaattt 
t teat tggegtattacttat tag tttacttgcaagtattattaacaaaccgctattattc 

50 ggtgtacaatacttaacaagcagtatgtctcaaacggtggctcaaactattattataata 
gctagaatcgtatctatagtattacgcctatgtctttattacattttgatttttggaatt 
attaattatttcgttagacgtggtgacaaaccagtcaaaagcaaaagacgtcataaaaat 
aaagatattaacaaaggtaatgtaaacgacaaagtagatactaaattaaatgct tccaac 
tccaaagatacagaagcagataaaatgaaagatcaacaaacacatatacaacaagacaaa 

55 actgatagtcaagaaaataacatatatgattccatcaaagaaaaagtaaatgaaaataaa 
gaaaatgttacagaacaatctaaaaatctatttgataagaaatag 

Sequence 230 

MLAQLGESTTKPII SIIFILLILALLFIFVGYPLITGTVYAIQKAINKEKVLFSDLFFAF 
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KKGKYAKSVILALITLVLFIVIVLILVLLNKLYSLALSPILIGLQQSISGYDNPMGILIT 
IQIVLLLITGFISSIFYWFVIIFIINYTTAYTEDSSRKVMSNLKEGFKGIKNGKKTWFKF 
FIGVLLISLLASIINKPLLFGVQYLTSSMSQTVAQTIIIIARIVSIVLRLCLYYILIFGI 
INYFVRRGDKPVKSKRRHKNKDINKGNVNDKVDTKLNASNSKDTEADKMKDQQTHIQQDK 
5 TDSQENNI YDSIKEKVNENKENVTEQSKNLFDKK* 

Sequence 231 

Contig_04 59_pos_5687_4 470, 

is similar to (with p-value O.Oe+00) 

10 >sp:sp| P397 54 |GLMS_BACSU GLUCOSAMINE- -FRUCTOSE- 6 -PHOSPHATE A 
MI NOT RAN SFE RASE (ISOMERIZING) (EC 2.6.1.16) ( HEXOSEPHOSPHATE 
AMINOTRANSFERASE) ( D-FRUCTOSE-6- PHOSPHATE AM I DOT RAN SFE RASE 
) (GFAT) (L-GLUTAM I NE-D- FRUCTOSE- 6- PHOSPHATE AMI DOTRANSFERAS 
E) (GLUCOSAMINE- 6- PHOSPHATE SYNTHASE) . 

15 atgttacaaactacaaaccaatacaaagagatacatgaccatgaaatagttattgttaag 
cgagacacagtagaaattaaagatcttgaggggcacattcaacaacgtgatacgtatacg 
gcagaaatagatgctgctgatgcagaaaaaggcgtatatgatcattacatgttaaaagaa 
attcatgaacagcctgcagtgatgcgtcgcattattcaagaatatcaagatgaaaaaggt 
aatttaaaaatcgattcagagattattaatgatgtagcagatgctgatcgtatttacatc 

20 gttgcagctggtactagttatcatgctggattggttggtaaagaatttattgaaaaatgg 
gcaggtgtacctactgaggttcatgtagcttctgaatttgtatataatatgccacttctt 
tctgaaaaaccactatttatttatatttcacaatctggtgaaacagctgatagtcgtgct 
gtattagttgaaacaaataagttaggtcacaaatcattaacaattactaatgttgctggt 
tcaacattatcacgtgaagcggatcatacattacttttacatgctggacctgagattgca 

25 gtcgcatctacaaaagcatatacagcgcaaattgctgttttatctatcttatctcaaatt 
gttgctaaaaatcatggtcgtgaaaccgatgttgatttattaagagaactagctaaggtt 
actacagctattgaaacaattgttgacgatgcacctaagatggagcaaattgcaacggat 
ttcttaaaaactactcgtaatgcattcttcattggacgaacaattgattataatgttagt 
ttagaaggtgcattaaaattaaaagaaatttcttatattcaagctgaaggatttgcaggt 

30 ggggaattaaagcacggaacaatcgctttgattgaagatggcacacctgttataggttta 
gctacacaagaaaacgttaatctatcaattcgtggaaatatgaaagaagtagtagcacgt 
ggtgcatatccttgtatgatttcaatggaaggtttgaataaagaaggagacacatacgtg 
attccacaagtacatgaattattaactcctttagtatctgtagtgacaatgcaattaatc 
tcatattatgctgcgttacaacgagatttagatgttgacaaacctcgtaacttagccaaa 

35 tcggttacagtagagtaa 

Sequence 232 

MLQTTNQYKEIHDHEIVIVKRDTVEIKDLEGHIQQRDTYTAEIDAADAEKGVYDHYMLKE 
IHEQPAVMRRIIQEYQDEKGNLKIDSEI INDVADADRIYIVAAGTSYHAGLVGKEFIEKW 
40 AGVPTEVHVASEFVYNMPLLSEKPLFIYISQSGETADSRAVLVETNKLGHKSLTITNVAG 
STLSREADHTLLLHAGPEIAVASTKAYTAQIAVLSILSQIVAKNHGRETDVDLLRELAKV 
TTAIETI VDDAPKMEQIATDFLKTTRNAFFIGRTI DYNVSLEGALKLKEISYIQAEGFAG 
GELKHGTIALIEDGTPVIGLATQENVNLSIRGNMKEVVARGAYPCMISMEGLNKEGDTYV 
IPQVHELLTPLVSVVTMQLISYYAALQRDLDVDKPRNLAKSVTVE* 

45 

Sequence 233 • 

Cont ig_04 5 9_pos_3 9 8 7_3 118, 

is similar to (with p-value 3.0e-19) 

>gp:gp|Z99122|BSUB0019_82 Bacillus subtilis complete genome 
50 (section 19 of 21): from 3597091 to 3809700. NID: g2636029. 
>gp:gp| Z929S4 | BSZ92954_6 B. subtilis yws [A, B, C, D, E, F, G] and g 
erBC genes. NID: gl894764. 

atggaggataatatgaaaaatataaaagcaatttttttagatatggatggaacgatttta 
catgaaaataataaagcctcggaatatactaaacaagtgattaatgaattgagagagcaa 
55 aattataaagttttccttgctactgggagatcttattcagaaatcagtcagcttgttcct 
gatggattcactgtagatggtattatcagttcgaatggaacttcaggtgaaattcatgga 
gataatttgtttagacatagtttaactttagaacgagtacagaaaattgtggaattggct 
aaaaaacaacatatttattatgaagtttttccttttgaaagtaatcgtatatctcttaaa 
gaagatgaagattggatgaaagaaatgatttccactatagagccacctgacgctgtaagt 



61 



WO 01/34809 



PCT/US00/30782 



caaagtgagtggtcatcgcgaagagaggcaattaaaggaaaaatagattggcgagatacc 
ttacctgatgcacacttttctaaaatatatttatttagtcccaacttagataaaataact 
gattttcgcaaccagcttgttgaaaaccaatcaaatttaggtattaccgtatctaattct 
tcgcgttataatgctgaaacgatgccatatcatacagataagggtacaggtatcaaggaa 
5 atgattgatcactatggtattaagcaagaagaaactttagttattggtgatagtgataat 
gatagagctatgtttaattttggccatcacactgttgcaatgaaaaatgcaagacaagaa 
attaaaaatcttacagatgatattaccgaatacacgaacgaagaagatggtgcagcacat 
tacttaaaaagtcatttattagataactag 

10 Sequence 234 

MEDNMKNIKAIFLDM DGTILHENNKASEYTKQVINELREQNYKVFLATGRSYSEISQLVP 
DGFTVDGIISSNGTSGEIHGDNLFRHSLTLERVQKIVELAKKQHIYYEVFPFESNRISLK 
EDEDWMKEMISTIEPPDAVSQSEWSSRREAIKGKIDWRDTLPDAHFSKI YLFSPNLDKIT 
DFRNQLVENQSNLGITVSNSSRYNAETMPYHTDKGTGIKEMIDHYGIKQEETLVIGDSDN 

15 DRAMFN FGHHTVAMKNARQEIKNLTDDITEYTNEEDGAAHYLKSHLLDN* 



Sequence 235 
20 Con t ig_0 4 5 9_pos_l 5 7 4 _7 9 8 , 

putative peptide of unknown function 

atgttagaatatgctaataagataacattgaagggagataatatcatgaaatttgctgtt 
atcactgatattcatggaaactttgatgcgcttcaaactgttttagatgatattgatagt 
agagatgatatcgaaaaaatttataacctaggtgataacatagggattggacatgagaca 

25 aataaagtactggatactatatttgaccgggatgatatggaaatgattgcaggtaatcat 
gatgaagctattatgtcactcgtcaatggaacaccttatcctgaagatttaaaagggaaa 
ttttatgagcatcatcaatggatagaaggacatttagatgagtcctattacgatgaaatt 
aatcaattgcctagatatattgaaatgaccataaaagggaaaaagattttatttattcat 
tatgaaattgaaaatgataaaatgtcagctcctattgatgaacaaccttttgcacctatt 

30 acaaaagatgacgaacaagctatttctgaattatttaaagacaaagaagccgatttaatt 
ttatttggacataaccataggttgcatatgtttgatgataaatcaacagtatattttaat 
cctggatcagtaggtttgaacaatggttcaaacaccgtatatggtattattaccgttaat 
gaaaaaggaatttcagtagagagagtgaaattagcgtacaataatgaagaatttttagca 
ggatttgaagaaaagcaagtaccagctagagaatttatatttaagaatttcatttaa 

35 

Sequence 236 

MLEYANKITLKGDNIMKFAVITDIHGNFDALQTVLDDIDSRDDIEKI YNLGDNIGIGHET 
NKVLDTI FDRDDMEMIAGNHDEAIMSLVNGTPYPEDLKGKFYEHHQWIEGHLDESYYDEI 
NQLPRYIEMTIKGKKILFIHYEIENDKMSAPIDEQPFAPITKDDEQAISELFKDKEADLI 
40 LFGHNHRLHMFDDKSTVYFNPGSVGLNNGSNTVYGI ITVNEKGI S VERVKLAYNNEEFLA 
GFEEKQVPAREFI FKNFI * 

Sequence 237 

Contig_04 60_pos_5997_5641, 

45 putative peptide of unknown function 

atgtttttaataactttattgcctatttttcaatatcaagcttctgcacatgcgacttta 
gaaaaatcaacaccacaacagcaaggggttattaaagacaaaccagaagcaatcaagtta 
gagtttaatgaacctgtgaacaccaaatactcgagtgtgaccttatttgatgataaaggt 
aaaaagattaaagaccttaaaccaataacaactggatggtctcagacagttgtattttca 

50 tctgagcaaattgttaatggcacgaatactattgaatggcatacggtatctgcggatgga 
catgaagtcggagatacgtttgaattttcagttggaaaagtgaggctaaagatgtag 

Sequence 238 

MFLITLLPIFQYQASAHATLEKSTPQQQGVIKDKPEAIKLEFNEPVNTKYSSVTLFDDKG 
55 KKIKDLKPITTGWSQTVVFSSEQIVNGTNTIEWHTVSADGHEVGDTFEFSVGKVRLKM* 

Sequence 239 

Con t ig J3 4 60 j?os_5 4 4 0_4 8 2 3 , 

putative peptide of unknown function 
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gtggtttatatgatgacactcacatctgatatattagaagatattctatcatttaaatta 
gaagtgataatgcaatttccgtatatattaagctctatttcactaatcattttgtttata 
cttttcattttaaaagatatggaaaaaatatggtactggctcatttcaatagttatgatt 
gctgtgataagtatgtctggacacgtgtggtcacaacaagtgccattatggtcaattatc 
5 ataagaacaattcatcttatagggctaacgttatggttaggttcactcgtttatctcatt 
tgttatgctattaaagtgaaaattaatcagttgacgagtgtaagacgtatgcttttaaaa 
gttaatatcattgctgtgattatgctcgtttttacagggattttaatggctattgatgaa 
acgaatactttaacactttggaataatgtgagcgcttggtctatttatcttgtcataaaa 
atcgcaggaattattgctatgatgctattaggtttctatcaaacgatgcgtgctttgaga 
10 caacgacaacaggtccatcgttttgcactgatgactgaattgttaattggtatgatatta 
attttgcaggtatcatga 



15 Sequence 240 

VVYMMTLTSDILEDILSFKLEVIMQFPYILSSISLIILFILFILKDMEKIWYWLISIVMI 
AVISMSGHVWSQQVPLWSIIIRTIHLIGLTLWLGSLVYLICYAIKVKINQLTSVRRMLLK 
VNI I AVTMLVFTGILMAI DETNTLTLWNNVSAWS I YLVIKIAGI I AMMLLGFYQTMRALR 
QRQQ VH RFALMTELL IGMILI LQVS * 

20 

Sequence 241 

Contig_04 60_pos_394 7_3564, 

putative peptide of unknown function 

atggtcataggtttattaagtggcttttactacagagaattaactaaagcgcatgacttt 
25 gtgggtgacacgcaattgtctttagtgcatacacatacacttatcttaggcatgtttatg 
tttttactcttattaccacttgaaaaagtatttaaattaagtagttattacttatttaat 
tggttctttttcgtgtatcatttaggtgtgttaatcacgatttcaatgatgacagttaaa 
ggtacattccaagttattggtaaaaaattttcacccgaaatgtttgcgggatttgcaggc 
ataggtcatacaggtatgcttgcaggtttactgttactgtttttcttattaagacaggct 
30 attcttacagaacccaaaaaataa 

Sequence 242 

MVIGLLSGFYYRELTECAHDFVGDTQLSLVHTHTLILGMFMFLLLLPLEKVFKLSSYYLFN 
WFFFVYHLGVLITISMMTVKGTFQVIGKKFSPEMFAGFAGIGHTGMLAGLLLLFFLLRQA 
35 ILTEPKK* 

Sequence 243 

Contig_04 60_pos_2387_882, 

is similar to (with p-value 4.0e-67) 

40 >sp:sp|P35164 IRESE_BACSU SENSOR PROTEIN RESE (EC 2.7.3.-). > 
pinpir I S4 5560 I S45560 hypothetical protein X18 - Bacillus su 
btilis >gp:gp|L09228 |BACDIA_27 Bacillus subtilis spoVA to se 
rA region. NID: g410114. >gp : gp | Z991 1 6 I BSUB0013_23 Bacillus 
subtilis complete genome (section 13 of 21) : from 2395261 to 

45 2613730. NID: g2634723. 

atgagcgagaaaccagactcatcgtataaagacactaaaaaacaaatgtttaatgaaata 
aaaaagagtactaaatttaaaaaagtgtttaaagaaggtgagtatgaaactcaaaatatt 
acaataaaaaataaaggtaattctcaatcttatcttttgctaggatacccaatgaaagct 
caaaaaggtgctcaaagtcattatagtggtgtttttatatataaagatttgaaatctatc 

50 ' gaagatacaaataatgctattacaattattatattaataactgctattatatttactata 
gcaagcacaatatttgcattctttttatctaatagaataacgaaacccttacgtcaatta 
aaaacacaagcacaaaaagtttctgaaggggattatagtcaaatttcaactgtcgctact 
aaagatgaaataggtgatttatcacgtgcatttaacaacatgaacgtagaaattcaagaa 
catatcaaagcaatttcttcatctaagaatataagagatacattattaaactctatggta 

55 gaaggcgttctaggcattaataatcaacgtgaaatcatattgtcgaacaagatggctgat 
gatattatgcgtcacattgatgatttttcaaaagaatctattgaacagcaaattgaagca 
acatttgaatcacaacagaatgagtatttagaattagaaattaatacaaggtactatgta 
tttatctccagttatatagatagaattcaaacaaatggtagaagtggtattgtcatggtc 
atccgtgatatgacaaatgaacataatcttgatcaaatgaaaaaagattttatagcaaat 
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gtatcacatgaattacgtacgccaatctctttattacagggttacactgagtccatagta 
gacggtatagttaccgaaccagatgaaatacgtgactcattagcaatcgttttagatgaa 
tctaagcgacttaatcgtttagtcaatgaattactaaatgtagctcgtatggatgctgaa 
ggattatcagttgagaaggaattacaacctattcaacaccttcttgataaaatggagtct 
5 aaatatcgcatgcaaagtgaagaattaggtttaacaatgacgtttgattctaataatgac 
gaacaattatggaactatgatatggatagaatggaccaagtgttaactaatttaattgat 
aacgcaacaagatatacacaagctggtgattctataaagatttctattgatgaagattca 
gatttcaatattttaacaataactgatacaggcactggtatagcaccggaacatctgaaa 
caagtatttgaccgtttttataaagtggacgctgctcgaaaaagaggtaagcaaggcacc 
10 ggattaggacttttcatttgtaaaatgattattgaagaacacgggggacgtattgatgtt 
gagagcgaattaggcaaaggtacttcatttattattagactacctaaatcaaaacaaatt 
agttag 

Sequence 244 

15 MSEKPDSSYKDTKKQMFNEIKKSTKFKKVFKEGEYETQNITIKNKGNSQSYLLLGYPMKA 
QKGAQS H YSGVFI YKDLKS I EDTNNAITI I ILITAI I FT I AST I FAFFLSNRITKPLRQL 
KTQAQKVSEGDYSQISTVATKDEIGDLSRAFNNMNVEIQEHIKAISSSKNIRDTLLNSMV 
EGVLGINNQREIILSNKMADDIMRHIDDFSKESIEQQIEATFESQQNEYLELEINTRYYV 
FISSYI DRIQTNGRSGIVMVIRDMTNEHNLDQMKKDFIANVSHELRTPISLLQGYTESIV 

20 DG I VTE PDE I RDSLAI VLDESKRLNRLVNELLNVARMDAEGLS VEKELQPIQHLLDKMES 
KYRMQSEELGLTMTFDSNNDEQLWNYDMDRMDQVLTNLIDNATRYTQAGDSIKISIDEDS 
DFNI LT I TDTGTGI APEHLKQVFDRFYKVDAARKRGKQGTGLGLFICKMI I EEHGGRI DV 
ESELGKGTSFIIRLPKSKQIS* 

25 Sequence 245 

Contig_04 60_pos_0_368, 

is similar to (with p-value 3.0e-18) 

>sp:sp| P50726I YPAA_BACSU HYPOTHETICAL 20.5 KD PROTEIN IN SER 
A-FER INTERGENIC REGION. >gp : gp | Z99116 I BSUB0013_17 Bacillus 

30 subtilis complete genome (section 13 of 21) : from 2395261 to 
2613730. NID: g2634723. >gp:gp | L4 7 64 8 I BACSERA_2 Bacillus su 
btilis phosphoglycerate dehydrogenase (serA), ypaA, ferredox 
in (fer), ypbB, recS, ypbD, ypbE, ypbF, ypbG, ypbH, glutamat 
e dehydrogenase (ypcA) , ypdA, ypdB, ypdC, spore cortex lytic 

35 enzyme (sleB), ypeB, ypfA, ypfB, cytidine monophosphate kin 
ase (cmk) , ypf D, ypgA, yphA, yphB, yphC, NAD+ dependent glyc 
erol-3-phosphate dehydrogenase (glyc) , yphE and yphF genes, 
complete cds . NID: gll46195. >gp:gp| L4 7 64 8 | BACSERA_2 Bacillu 
s subtilis phosphoglycerate dehydrogenase (serA), ypaA, ferr 

40 edoxin (fer), ypbB, recS, ypbD, ypbE, ypbF, ypbG, ypbH, glut 
amate dehydrogenase (ypcA), ypdA, ypdB, ypdC, spore cortex 1 
ytic enzyme (sleB) , ypeB, ypfA, ypfB, cytidine monophosphate 

kinase (cmk) , ypfD, ypgA, yphA, yphB, yphC, NAD+ dependent 
glycerol-3-phosphate dehydrogenase (glyc), yphE and yphF gen 

45 es, complete cds. NID: gll46195. 

atgggagaagatggaggttttttgttgttcaaatttcctccttttctaagtaagatgaat 
ggaaggagaaaattatatatgcaacaaaacaaacgtttgattacaattagtatgttaagt 
gcggtagcgtttgtgttaactttcatcaagtttccattgccatttataccaccgtatcta 
actctcgattttagtgatgtaccgacgttattagcaacattcctcttaagtcctattgct 

50 gggattatcgttgcactcatcaaaaatattttaaattttctattcaatataggggatcct 
gttggaccagtagctaactttttagcaggcgtcagctttttgctatcatcatactatgtt 
tatTCTAT 

Sequence 24 6 

55 MGEDGGFLLFKFPPFLSKMNGRRKLYMQQNKRLITISMLSAVAFVLTFIKFPLPFIPPYL 
TLDFS DVPTLLATFLLSPIAG1IVALIKNILNFLFNIGDPVGPVANFLAGVSFLLSSYYV 
YSX 

Sequence 247 
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Cont ig_04 61_pos_160_4 77 , 

putative peptide of unknown function 

atggcaaatgcggatgacgttttaagtggtgacagttatttcatgtcagaacttaagcaa 
ctggtacaccagaggtatgtccatcccggtcctctcgtactaaggacagctcctctcaaa 
5 tttcctacgcccacgacggatagggaccgaactgtctcacgacgttctgaacccagctcg 
cgtaccgctttaatgggcgaacagcccaacccttgggaccgactacagccccaggatgcg 
atgagccgacatcgaggtgccaaacctccccgtcgatgtgaactcttgggggagataagc 
ctgttatccccggggtag 

10 

Sequence 248 

MANADDVLSGDSYFMSELKQLVHQRYVHPGPLVLRTAPLKFPTPTTDRDRTVSRRSEPSS 
RTALMGEQPNPWDRLQPQDAMSRHRGAKPPRRCELLGEISLLSPG* 

15 

Sequence 249 

Contig_04 61 j?os_4273_4 67 1 , 

putative peptide of unknown function 

gtgcatagttacttacacatttgttcttccctaataacagagttttacgatccgaagacc 
20 ttcatcactcacgcggcgttgctccgtcaggctttcgcccattgcggaagattccctact 
gctgcctcccgtaggagtctggaccgtgtctcagttccagtgtggccgatcaccctctca 
ggtcggctacgcatcaattgtggtgcgttattatttttcatgattttaggtcaggtcaat 
tatttttatggtattattatggcttctagcatgatgataggtgcgttgttaggtgctcaa 
tttgctttgaaaaaaggggtaggatatgtaaaagctttatttttagtggttactgcaata 
25 ttaattataaaaaatctctacgattttattgtgcagtaa 

Sequence 250 

VHSYLHICSSLITEFYDPKTFITHAALLRQAFAHCGRFPTAASRRSLDRVSVPVWPITLS 
GRLRINCGALLFFMILGQVNYFYGIIMASSMMIGALLGAQFALKKGVGYVKALFLVVTAI 
30 LIIKNLYDFIVQ* 

Sequence 251 

Cont ig_0 4 6 l_pos_4 9 1 4_5 4 0 8 , 
is similar to (with p-value 1.0e-41) 
35 >gp: gp | L11577 | STRSCAA_5 Streptococcus gordonii coaggregat ion 
mediating adhesin (scaA) , ATP binding protein, hydrophobic 
membrane protein, complete cds, and zinc metalloprotease gen 
e r partial cds . NID : g310629. 

atgactcaaattacatttaaaaataatcccattaaattatcaggttctgaagtgaatgaa 
40 ggtgatatcgcaccaaatttcacagtgcttgataatagtttgaatcaaattactttagat 
gattataaaaacaaaaagaaattaattagtgttataccatctattgatacaggagtatgt 
gatagtcaaactcgaaagtttaatgaagaagcttcagcagaagatggtgtagttttaacg 
atatcagtagatttacctttcgcccaaaaaagatggtgtgcatcaagcggattagataat 
gtaattactttaagtgatcataaagatttatcttttggtcgaaattatggacttgtgatg 
45 gatgaattacgcttacttgcacgttcggtatttgtgttaaacgaaaacaataaagtagta 
tataaggaaattgtcagcgaaggtacgaattaccctgattttgaagctgcattaaaagct 
tacagaaatatttag 

Sequence 252 

50 MTQITFKNNPIKLSGSEVNEGDIAPNFTVLDNSLNQITLDDYKNKKKLISVIPSIDTGVC 
DSQTRKFNEEASAEDGVVLTISVDLPFAQKRWCASSGLDNVITLSDHKDLSFGRNYGLVM 
DELRLLARSVFVLNENNKVVYKEIVSEGTNYPDFEAALKAYRNI* 

Sequence 253 
55 Cont ig_04 61_pos_5504_64 8 4 , 

is similar to (with p-value 7.0e-52) 

>sp:sp|P37876| YTXK_BACSU HYPOTHETICAL 37.4 KD PROTEIN IN ACK 
A-SSPA INTERGENIC REGION , >gp : gp I AF008220 | AF008220_1 4 4 Bacil 
lus subtilis rrnB-dnaB genomic region. NID: g2293135. >gp:gp 
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j Z99119 I BSUB0016_21 Bacillus subtilis complete genome (secti 
on 16 of 21): from 2997771 to 3213410. NID: g2635411. 
atggacacttttttaaatagaaagggatattttatgtctgaagaaaatactattatggaa 
cgtctatttcataaattagatgataaagctaaaacgttaaacaaagaaaatggacagagt 
5 tttatcgaaaatttagggttagctatggaagatatttatacaaaccaaagagaactttta 
gaacaagcaacgcttcaagatagaaggaaagcttttcaatttgcatatttaagtttatta 
caagaagaaaatattcaagctaa teat cagatcacgcctgactctataggact cat t etc 
ggttttcttgttcaacgctttttagaacataaaaaggaaatgcacattgtagatattgca 
agtggggcaggtcatctaagtgcagctgtgaaagaagtactttctgataaaacaattatg 

10 catcatctgatagaggtagatccagtgctatcacgtgtaagtgtgcatttggctaatttt 
ttagagataccgtttgacgtttatcctcaagatgcgattatgccattaccattggaagag 
gctgatgtcgtgattggagatttcccaataggatactatcctttagatgaacgtagtaga 
gaaatgaagttaggctttgaagagggacacagttattcccatcatctgttaatagaacaa 
tctattaatgcgctaaaaggggcaggttatgcatttttagttgttcctagtcatctcctt 

15 gaagatgataaagtgaaacagttggaaaatttcattgctacagagactgagatgcaagca 
tttttaaatttacctaaaacattatttaaaaatgaaaaagcacgtaaatctatattgatt 
ttacaaaagaaaaaatcaggcgaaactcgaccagttgaagtcttattagccaatatccct 
gat'tttaaaaatcctcaacaatttcaaggtttcatttctgaattgaatcagtggatagtc 
acaaatcatacaaaaaaatag 

20 

Sequence 254 

MDTFLNRKGYFMSEENTIMERLFHKLDDKAKTLNKENGQSFIENLGLAMEDIYTNQRELL 
EQATLQDRRKAFQFAYLSLLQEENIQANHQITPDSIGLILGFLVQRFLEHKKEMHIVDIA 
SGAGHLSAAVKEVLSDKTIMHHLIEVDPVLSRVSVHLANFLEIPFDVYPQDAIMPLPLEE 
25 ADVVIGDFPIGYYPLDERSREMKLGFEEGHSYSHHLLIEQSINALKGAGYAFLVVPSHLL 
EDDKVKQLENFIATETEMQAFLNLPKTLFKNEKARKSILI LQKKKSGETRPVEVLLANI P 
DFKNPQQFQGFISELNQWIVTNHTKK* 

Sequence 255 
30 Contig_04 61_pos_3582_3238, 

putative peptide of unknown function 

gtgacaaaccggaggaaggtggggatgacgtcaaatcatcatgccccttatgatttgggc 
tacacacgtgctacaatggacaatacaaagggcagcgaaaccgcgaggtcaagcaaatcc 
cataaagttgttctcagttcggattgtagtctgcaactcgactatatgaagctggaatcg 
35 ctagtaatcgtagatcagcatgctacggtgaatacgttcccgggtcttgtacacaccgcc 
cgtcacaccacgagagtttgtaacacccgaagccggtggagtaaccatttggagctagcc 
gtcgaaggtgggacaaatgattggggtgaagtcgtaacaaggtag 

Sequence 256 

40 VTNRRKVGMTSNHHAPYDLGYTRATMDNTKGSETARSSKSHKWLSSDCSLQLDYMKLES 
LVIVDQHATVNTFPGLVHTARHTTRVCNTRSRWSNHLELAVEGGTNDWGEVVTR* 

Sequence 257 

Contig_04 62_pos_27_4 4 0, 

45 is similar to (with p-value 5.0e-18) 

>gp:gp| AF012906 |AF012906_6 Bacillus subtilis yo j P gene, part 
ial cds; yojQ/S, yojR, yojT, yojU, yojV, yojW, yojX, yojY, y 
ojZ, and yokA genes, complete cds. NID: g2522404. >gp:gp|Z99 
114 | BSUB0011_163 Bacillus subtilis complete genome (section 

50 11 of 21): from 2000171 to 2207900. NID: g2634230. >gp:gp|AF 
020713 1 AF020713_166 Bacteriophage SPBc2 complete genome. NID 
: g3025478. 

atgataggtacttatcaaagtgataaaaactttgaaatgatgaagacttttaagcattgg 
attcagactaatcattattggaaatatgttgagaaatacggtgtgttaggtatagcatta 
55 gataatcctctccacgttcaaagtaatcaatgtagatatgacgttgttttgagaatagat - 
gaaacagtaaatgatcagacaatatctaaaagagattttacaggtggcatatatgctgtg 
tttaaagttagtcatacaaaaataaatatagagaagttctttagcaatttagaaaatatt 
ttaaatgaaagtcatttgcgtatgagaaatgaaccaattatagagagatacattgaagaa 
gagggaacagataaagtgtgtgaaatgttagtgcctatctatgaagtaaattaa 
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Sequence 258 

MIGTYQSDKNFEMMKTFKHWIQTNHYWKYVEKYGVLGIALDNPLHVQSNQCRYDVVLRID 
ETVNDQTISKRDFTGGI YAVFKVSHTKINIEKFFSNLENILNESHLRMRNEPI IERYIEE 
5 EGTDKVCEMLVPIYEVN* 



Sequence 259 
10 Contig_04 62_pos_1824_34 4 0, 

is similar to (with p-value 3.0e-53) 

>sp:sp| P45082|CYDD_HAEIN TRANSPORT ATP-BINDING PROTEIN CYDD. 
>pir :pir I F64186I F64186 transport ATP-binding protein (cydD) 
homolog - Haemophilus influenzae (strain Rd KW20) >gp:gp|U3 
15 2795 1 U32795_6 Haemophilus influenzae Rd section 110 of 163 o 

f the complete genome. NID: gl574708. 

atggttttaagttataaattataccccacactcatgttaatcatgagcgtttttttatct 
tttacggtcgttgcgcaaaacatttcaatttcacactttttaaatcacttactgtattat 
caacaacaatctttattattattgttatcagttatttttatctctcttattttaagagca 

20 acatttaatatgctgattcaatttttaggagatcatttggcatttaaagtaaaacatatg 
cttagagaacaagtgattttgaaaaaaagtgtccgttcaattggtgaagaaataaatatt 
ttaactgaaagtattgatggtatcggtccgttctttcagagttatttacctcaagtcttt 
aaatcaatgttgattcccatcgttattattattaccatgtgttttgttcatttacctact 
gctattattatgatagttaccgcaccttttattccattgttttatgttatttttggactt 

25 aaaacaagagatgagtcaaaggatcaaatgacatatttaaaccagtttagtcaacgtttt 
ttaaatacagctaaaggtcttattacatttaaacttttaaatcaaacgaaacaatctgag 
caacaactttataaagacagtacacgttttagagatttaacaatgcgtattttgaaaagt 
gcctttttatcaggacttatgcttgagttcataagtatgttagggattggattggtcgca 
ttggaagcggctttaagcttagttgtatttaaccatatcaactttgtgactgcagcgata 

30 gcgattattttagctcctgaattttataatgcgattaaagatttaggtcaagcatttcat 
acaggtaagcaaagtgaaggtgctagcgatgtggtgttttcatttttagaatctgaagat 
aaagctgattctcctacattaaaagtggatgagcaacagtttgaacaagttttaattaag 
catgttgattttcaatacgctaatagtaatcatatggctttgaaaaacatttctttttcg 
gtaaataaaggagaaaaggtcgctattgtgggaccgagtggtgcagggaaatccacttta 

35 gctaagttgcttagtcaatcagtaacacccacacatggaacactttcatttaaccaagca 
tcattaaatatcggatttctaagtcagcgcccacatatatttgcagattctatcaaaaat 
aatattgcaatgtatgatgatgagatatgtgatgagcaagtgattcaagtgcttgatgaa 
gtggggttaaaagagaaagtactttcattaaaatatggtatctatacttctattggtgaa 
ggtggggaaatgttatcaggtggacaaatgagacgtattgagttaagtcgtttattatta 

40 ttgaaaccagatattgtaatttttgatgaaccagcgataggattagatattgaaactgaa 
aaggtcatacaacaagtattagagcatcatttttctacaacgacagtgtttattattgca 
caccgtgattcaaccattcgaagttcagcacggcgtatatatatcgaaagtggtcatctt 
ataaaagatgattcgataatttctgttacgcgtagtgaggtgaagatagatcaatga 

45 Sequence 260 

MVLSYKLYPTLMLIMSVFLSFTVVAQNISISH FLNHLLYYQQQSLLLLLSVI FI SLILRA 
TFNMLIQFLGDHLAFKVKHMLREQVILKKSVRSIGEEINILTESIDGIGPFFQSYLPQVF 
KSMLI PIVII ITMCFVHLPTAI IMI VTAPFI PLFYVI FGLKTRDESKDQMTYLNQFSQRF 
LNTAKGLITFKLLNQTKQSEQQLYKDSTRFRDLTMRILKSAFLSGLMLEFISMLGIGLVA 

50 LEAALSLVVFNHINFVTAAIAIILAPEFYNAIKDLGQAFHTGKQSEGASDWFSFLESED 
KADSPTLKVDEQQFEQVLIKHVDFQYANSNHMALKNISFSVNKGEKVAIVGPSGAGKSTL 
AKLLSQSVTPTHGTLSFNQASLNIGFLSQRPHIFADSIKNNIAMYDDEICDEQVIQVLDE 
VGLKEKVLSLKYGIYTSIGEGGEMLSGGQMRRIELSRLLLLKPDI VIFDEPAIGLDIETE 
KVIQQVLEHHFSTTTVFIIAHRDSTIRSSARRI YIESGHLIKDDSIISVTRSEVKI DQ* 

55 

Sequence 261 

Con t ig_0 4 62_pos_l 5 1 3_7 2 8 , 

is similar to (with p-value 6.0e-63) 

>gp:gp| U87792 | BSU87792_1 Bacillus subtilis tRNA-Ala, phospha 
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tidylglycerophosphate synthase (pgsA) and CinA (cinA) genes, 
complete cds, and RecA (recA) gene, partial cds . NID: gl842 
434 . 

atgatacttgtcgatgatatgtggttaaagtcaactaattttctcggttctcaatcagca 
5 ttcacatttaaagttgttatacagttaggttcagtatttgcggccgcctgggtatttaga 
gaacgcttcttagaaattttacatattggccaacataaacctgaaccttccacttcggga 
gacagacgttcaaaaccacgacgtctgaatttaatacatgtattagtaggtatggtccca 
gcagggattttaggatttttatttgatgatttaattgaaaaatacttatttagtgtacca 
acagtcttaattggtttatttataggtgccatttatatgattatagctgataagtattct 

10 aaaactgttcagcatcctcaaacagtagatcaaattaattatttccaagcatttgtcatt 
ggtatctctcaagcaatagctatgtggcctggatttagtagatccggttcaacgatttca 
acaggtgttcttatgaaattgaatcataaagctgcatctgatt tcacttttattatgtcg 
gtaccaattatgttagctgcaagtggattatctttactaaaacattatgagtatattcat 
ttagcacacataccattctacattttaggatttttagcggcatttattgttggattaatt 

15 gcaattaaaacattcttacacttaattaataaagttaagttagtaccttttgctatttat 
agaattgtcttagttatttttatagcaatcctatacttcggattcggtattggcaaagga 
atttaa 

Seguence 262 

20 MILVDDMWLKSTNFLGSQSAFTFKVVIQLGSVFAAAWVFRERFLEILHIGQHKPEPSTSG 
DRRSKPRRLNLIHVLVGMVPAGILGFLFDDLIEKYLFSVPTVLIGLFIGAI YMIIADKYS 
KTVQHPQTVDQINYFQAFVIGISQAIAMWPGFSRSGSTISTGVLMKLNHKAASDFTFIMS 
VPIMLAASGLSLLKHYEYIHLAHIPFYILGFLAAFIVGLIAIKTFLHLINKVKLVPFAIY 
RIVLVIFIAILYFGFGIGKGI* 

25 

Seguence 263 

Contig_04 63_pos_4 4 7 9_6836, 

is similar to (with p-value 0.0e+00) 

>gp:gp|M8 6227 | STARECF_3 Staphylococcus aureus DNA gyrase B s 

30 ubunit (gyrB) RecF homologue (recF) and DNA gyrase A subunit 
(gyrA) gene, complete cds. NID: gl53083. 
gtgtcctgtactatggatggtgacggtgcagctgcaatgcgttataccgaagcacgtatg 
actaaaataacattagaacttttacgtgatattaacaaagacacaattgattttattgac 
aactatgatggtaatgaaagagagccgtcagtcttacctgcacgtttccctaacttacta 

35 gtaaatggtgcggcaggaattgccgtaggtatggctacaaatattcctccccacaattta 
actgaagttattgatggtgtgctcagtttaagtaagaatccagacatcacaattaatgag 
ctgatggaagacatacaaggtcctgattttcctacagctggtttagtactagggaaaagt 
ggtattcgtcgagcttatgaaacaggtcgtgggtcaattcaaatgcgttctcgtgctgaa 
atagaagaacgtggtggtggccgtcaacgtattgtcgtaacggaaatacctttccaagtc 

40 aataaagcgcgtatgattgaaaaaatcgcagagttagttagagataagaaaatcgacggt 
attacagatttacgtgatgaaacaagtttgcgtacaggtgtaagagtagttattgatgta 
cgtaaagatgcaaatgcgagtgttattttaaataatttatataaacaaacgccattacaa 
acatcatttggtgtgaatatgattgctttagtgaatggtagacctaaactaatcaattta 
aaagaagcacttatccattacttagaacaccaaaaaacagtggttagacgacgtactgaa 

45 tataatcttaaaaaagcaagagaccgtgcccatattctagaaggtttacgaatagcacta 
gatcatattgatgaaattatcacaacaattcgtgaatcggacactgataaaattgcgatg 
gcaagtttacaagagcgttttaaactaactgaacgtcaagctcaagcaattttagatatg 
cgtttaagacgtttaactggattagaaagagataaaatagaatctgagtataatgaactt 
ctagaatatattaaagagttagaagagattttagctgatgaagaagtactattacaatta 

50 gttcgtgatgaattgactgaaattaaagaacgtttcggcgatgaacgtcgcactgaaatt 
caattaggtggtctagaagatcttgaagatgaagacttaatccctgaagaacaaattgtt 
attacattaagtcataataactatattaaacgtttaccagtatctacatatcgttctcaa 
aatcgtggtggtcgtggcatacaaggtatgaacacgttggatgaggacttcgttagtcaa 
ttggtaacaatgagtacacatgattatgttctgttctttacgaataaaggtcgtgtatat 

55 aaactcaaaggttatgaagttcctgagttgtcacgtcaatccaaaggcatacctattatt 
aatgcgattgaactcgaaaatgacgaaacaataagtacgatgattgcagttaaagacct t 
gaaagtgaagaagattatctcgtatttgcgacaaaacaaggtatcgttaaacgttcatca 
ttaagtaacttctcccgtattaacaaaaacggtaaaattgcaattaactttaaagaagat 
gatgaattaattgcagtacgtctaacaacaggtaatgaagatattcttattggaactgca 



68 



WO 01/34809 



PCT7US00/30782 



catgcatcattaattagattctctgaatctacattacgcccattaggccgtacagcagca 
ggtgtgaaaggtatttctctacgtgaaggggatactgtcgtaggtcttgatgttgcagat 
tcagaaagtgaagatgaagtattagtagttactgaaaatggttacggtaaacgtacacct 
gttagcgaatatcgtttatcaaatcgtggtggtaaaggaatcaaaactgcgacaattacc 
5 gagcgtaatggtaacatcgtttgtatcacaactgtaaccggtgaagaggatttaatggtt 
gtaactaacgctggtgttattattcgtcttgacgttcatgatatttctcaaaatggacgt 
gcagcacaaggtgtacgccttatgaaactcggagatggtcaatttgtttctactgttgct 
aaagtaaacgaagaagacgataatgaggaaaatgcagatgaagcgcaacaatctactact 
act gaaacagcagatgtagaagaggtagtcgat gat cagacaccaggcaatgcgat teat 
10 acagaaggtgatgcagaaatggaatctgtagagtttcctgaaaatgatgatcgtattgat 
atcagacaagattttatggatagagtgaatgaagatatcgagagtgcttcagataatgaa 
gaagatagtgatgaataa 

Sequence 264 

15 VSCTMDGDGAAAMRYTEARMTKITLELLRDINKDTIDFIDNYDGNEREPSVLPARFPNLL 
VNGAAGIAVGMATNI PPHNLTEVIDGVLSLSKNPDITINELMEDIQGPDFPTAGLVLGKS 
GIRRAYETGRGSIQMRSRAEIEERGGGRQRIVVTEIPFQVNKARMIEKIAELVRDKKIDG 
ITDLRDETSLRTGVRVVIDVRKDANASVILNNLYKQTPLQTSFGVNMIALVNGRPKLINL 
KEALIHYLEHQKTVVRRRTEYNLKKARDRAHILEGLRIALDHIDEIITTIRESDTDKIAM 

20 ASLQERFKLTERQAQAILDMRLRRLTGLERDKIESEYNELLEYIKELEEILADEEVLLQL 
VRDELTEIKERFGDERRTEIQLGGLEDLEDEDLI PEEQIVITLSHNNYIKRLPVSTYRSQ 
NRGGRGIQGMNTLDEDFVSQLVTMSTHDYVLFFTNKGRVYKLKGYEVPELSRQSKGIPII 
NAIELENDETISTMIAVKDLESEEDYLVFATKQGIVKRSSLSNFSRINKNGKIAINFKED 
DELIAVRLTTGNEDILIGTAHASLIRFSESTLRPLGRTAAGVKGISLREGDTVVGLDVAD 

25 SESEDEVLVVTENGYGKRTPVSEYRLSNRGGKGIKTATITERNGNIVCITTVTGEEDLMV 
VTNAGVIIRLDVHDISQNGRAAQGVRLMKLGDGQFVSTVAKVNEEDDNEENADEAQQSTT 
TETADVEEVVDDQTPGNAIHTEGDAEMESVEFPENDDRIDIRQDFMDRVNEDIESASDNE 
EDSDE* 

30 Sequence 265 

Cont ig_0 4 6 3_pos_8 8 8 9_1 0 1 7 5 , 

is similar to (with p-value 0.0e+00} 

>sp:sp|P95689|SYS_STAAU SERYL-TRNA SYNTHETASE (EC 6.1.1-11) 
(SERINE— TRNA LIGASE) (SERRS) . >gp : gp | Y0992 4 | SASERS_1 S . aure 

35 us serS gene. NID: gl835217. 

atgttagacattcgtttatttagaaatgaacctgagaaagtgaagagcaaaattgaatta 
agaggcgacgatcctaaagttgtcgaccaagttttagaattagatgaacaacgccgtgaa 
ttaatcagtaaaactgaagagatgaaggcgaaaagaaataaagtgagcgaagaaatagct 
caaaagaaacgtaataaagaagacgctgatgatgtcattgctgagatgcgtcatttaggt 

40 gatgaaattaaagatatcgataatcaacttaatgaagtagataataaaattagagatatc 
ttaattcgtattcctaacttaattaatgaagatgtacctcaaggtgattctgatgaagaa 
aacgttgaagttaaaaaatggggtacgccacgtgattttgaatttgaacctaaagcgcac 
tgggatttagttgaagaattaaaaatggctgactttgaacgtgctgctaaagtatctggt 
gctcgtttcgtatacttaactaaagatggcgcattacttgaacgtgctttaatgaattac 

45 atgttgacaaaacatacaacgcaacatggttatactgaaatgatgacacctcaattagtg 
aatgctgatacgatgtttggaacaggtcaattacctaaatttgaagaagatttatttaaa 
gttgaaaaagaaggcttatatacgattccaactgcagaagtacctttaacaaacttctat 
agagatgaaattattcaaccaggtgtactacctgaattatttacagctcaaactgcatgt 
ttccgtagtgaagcaggatcagctggtagagatactagagggttaattcgtttacatcaa 

50 tttgataaagttgaaatggttcgtattgtacaacctgaagattcttgggatgctttagaa 
gaaatgacacaaaatgctgaagctattcttgaagaattaggtttaccataccgtcgtgtt 
atcttatgtactggcgatattggtttcagtgctagtaaaacatatgatttagaagtttgg 
ttaccaagttacaatgattataaagaaatcagttcttgctctaactgtactgatttccaa 
gcacgtcgcgcaaatatcagat tcaaacgtgatgctgcttctaaaccagaattagtacac 

55 acattaaatggtagtggtttagcagtaggtcgtacatttgcagccatcgttgaaaactat 
caaaacgaagatggtacattaacaattcctgaagcattagtaccatttatgggtggcaaa 
actaaaattgaaaaaccaatcaaataa 
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Sequence 2 66 

MLDIRLFRNEPEKVKSKIELRGDDPKVVDQVLELDEQRRELISKTEEMKAKRNKVSEEIA 
QKKRNKEDADDVIAEMRHLGDEIKDIDNQLNEVDNKIRDILIRIPNLINEDVPQGDSDEE 
5 NVEVKKWGTPRDFEFEPKAHWDLVEELKMADFERAAKVSGARFVYLTKDGALLERALMNY 
MLTKHTTQHGYTEMMTPQLVNADTMFGTGQLPKFEEDLFKVEKEGLYTIPTAEVPLTNFY 
RDEIIQPGVLPELFTAQTACFRSEAGSAGRDTRGLIRLHQFDKVEMVRIVQPEDSWDALE 
EMTQNAEAILEELGLPYRRVILCTGDIGFSASKTYDLEVWLPSYNDYKEISSCSNCTDFQ 
ARRANI RFKRDAASKPELVHTLNGSGLAVGRT FAAIVENYQNEDGTLTI PEALVPFMGGK 
10 TKIEKPIK* 

Sequence 267 

Contig_04 63_pos_12074_12766, 
putative peptide of unknown function 

15 atgactcaccttacgtttaaacaaggtgtgaaagagtgtattcccacgttacttggttat 
gcaggtgtaggactatcgtttggaattgtggcagtctcccaaaatttcagtgttttagaa 
attattttattgtgtctgattatttatgctggtgcagctcaatttattatttgtacatta 
gt gat tgcaggcacccctatttctgcaattgtgcttacaa tact tat cgttaactctcga 
atgttcttattaagtatgactttagcacctaattataagcaatatggattttggaatagg 

20 gtagggcttggaacgttattaacagatgaaacttttggcgttgctataacaccatatgtt 
aaaggtgaaaaaattaacgatcgatggctacacggactaaatattactgcttacttattt 
tggactgtttcctgtgtaatcggtgccattttcggagagtatatttcaaatcctgatgcg 
ctcggcctagactttgcaattaccgcaatgtttatttttttatgtatatctcaatttgaa 
gggattaagaaatcacgattgagaatatatattgtactcattgtatgtgtgattgtgatg 

25 atgcttcttctaagttcaattctaccttcatacctagcaattttaatagccgcaattgtt 
gctgcattgttaggggtggtgatggacaaatga 

Sequence 268 

MTHLTFKQGVKECI PTLLGYAGVGLSFGI VAVSQNFS VLEI ILLCLI I YAGAAQFI ICTL 
30 VIAGTPISAI VLTILIVNSRMFLLSMTLAPNYKQYGFWNRVGLGTLLTDETFGVAITPYV 
KGEKINDRWLHGLNITAYLFWTVSCVIGAIFGEYISNPDALGLDFAITAMFIFLCISQFE 
GIKKSRLRIYI VLIVCVIVMMLLLSSILPSYLAILIAAIVAALLGVVMDK* 

Sequence 2 69 
35 Contig_04 63_pos_13381_14 34 9, 

is similar to (with p-value 2.0e-26) 

>gp:gp|Z98271|MLCB1779__3 Mycobacterium leprae cosmid B1779. 
NID: g2326678. 

atgacgaattacacggttaatacattagaactaggtgagtttaaaactgaatctggtgaa 

40 acgattgatcatttacgtctacgttatgaacatgtaggacttcctggtcaaccccttgtc 
gttgtttgccatgcacttactggcaatcatttaacatacggcacggatgcacaacctggc 
tggtggcgagaaatcattgacggtggctacattccagttcatgattatcaatttcttaca 
ttcaatgtcattggaagtccatttggttcgagttctaaattaaatgatgataacttccca 
gaacatttaacattgagagatattgttagagctattgagttaggtatacaagcattagaa 

45 tttaagaaaattaatattctcattggaggtagtttaggtggtatgcaagcgatggaattg 
ctttataatcgtcaattcgaggtggaaaaagcaatcatattagctgctactgataaaacg 
tcctcttatagtcgtgcttttaacgagattgcaagacaagctatacatataggcggtaaa 
gaaggtttaagtattgcacgtcaactcggctttctcacgtatcgatcgtctaaaagttat 
gatcaacgttttacaccagatgaagtagtgagctatcaacaacatcaaggtgataagttc 

50 aaagaatatttcgatttaaattgttatttaacactgctagacgtcttagatagtcaccat 
ttagatagaggaagagatgatgttgatgaagtctttcagtcgttggaaacgaaagtacta 
acaatgggttttattgacgatttgctttatcctgatgatcaagtgagagccttaggagaa 
cgttttaaatatcatcgtcatttcttcgtaccagataatgtgggacatgatggttttctt 
ctaaattttaatgattgggcgcctaatttatatcatttcttaaatttgaaacaattccga 

55 cgtaaatag 



Sequence 270 
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MTNYTVNTLELGEFKTESGETIDHLRLRYEHVGLPGQPLVWCHALTGNHLTYGTDAQPG 
WWREIIDGGYI PVHDYQFLTFNVIGSPFGSSSKLNDDNFPEHLTLRDIVRAIELGIQALE 
FKKINILIGGSLGGMQAMELLYNRQFEVEKAIILAATDKTSSYSRAFNEIARQAIHIGGK 
EGLSIARQLGFLTYRSSKSYDQRFTPDEVVSYQQHQGDKFKEYFDLNCYLTLLDVLDSHH 
5 LDRGRDDVDEVFQSLETKVLTMGFIDDLLYPDDQVRALGERFKYHRHFFVPDNVGHDGFL 
LNFNDWAPNLYHFLNLKQFRRK* 

Sequence 271 

Contig_04 63_pos_14578_15504, 

10 putative peptide of unknown function 

atgggagtggcgtatgtgttttcaaaaatacaacctaaagcaactattttagcaattatt 
tcattattagtagtcgctttagtaacacatgtattacctgttctcggcttgattttatgt 
ttatttgcaacgattcccggtattgttttgtggaatcgttccatacaatcattcggaatt 
agtgcattagtaacagttgtacttacaacattattaggtaatacatttgtcttaagtatg 

15 atggtcttaatcttattattaagtgcgattatcggacaattacttaaggaaagaacatct 
aaagaacgaattctttatatttcaacagcttcattaagtttagttacacttattggatgg 
atgttattacaaacattcgataaaattccgacggcagctgtattaattaagcctttaaaa 
aatgcaatgcatgaggctttcttaaaaagtggaatcgattcaaactatagacagattctt 
gaggaaagtttccgacaaatgacggtccaactccctagttttctaattatagttattttc 

20 atttttgtcttaattaatctgattattacatttccaattttacgcaaatttaaagtagca 
acacctatttttaaacctttattcgcatggcaaatgagccgtaatttactatggttttat 
cttatagtacttatttgtgtcatgattgcgagtgaaccaagtacgttccaaagcatcgtg 
ttaaactttgatgttgtgttatcattagtgatgtacatccaaggattaagtgtcattcac 
ttctttggtaaagctaaaagatggccgaactttgcaacaattcttgttatggtagtaggg 

25 acgcttcttacaccggcaacgcatattgttggattacttggggtaattgatttatgtatt 
aatttaaagaaaataataaaaaaatga 

Sequence 27 2 

MGVAYVFSKIQPKATILAIISLLVVALVTHVLPVLGLILCLFATI PGIVLWNRSIQSFGI 
30 SALVTVVLTTLLGNTFVLSMMVLILLLSAIIGQLLKERTSKERILYISTASLSLVTLIGW 
MLLQTFDKI PTAAVLI KPLKNAMHEAFLKSGI DSN YRQILEES FRQMTVQLPS FLI I VI F 
IFVLINLIITFPILRKFKVATPIFKPLFAWQMSRNLLWFYLIVLICVMIASEPSTFQSIV 
LNFDWLSLVMYIQGLSVIHFFGKAKRWPNFATILVMVVGTLLTPATHIVGLLGVIDLCI 
NLKKIIKK* 

35 

Sequence 27 3 

Contig__04 63_pos_15515_17500, 

is similar to (with p-value 0.0e+00) 

>sp: sp I P374 84 I YYBT_BACSU HYPOTHETICAL 74.3 KD PROTEIN IN RPL 
40 I-COTF INTERGENIC REGION. >gp : gpl D26185 I BAC180K_10 B. subtil 
is DNA, 180 kilobase region of replication origin. NID: g467 
326. >gp:gp| Z99124 |BSUB0021_156 Bacillus subtilis complete g 
enome {section 21 of 21): from 3999281 to 4214814. NID: g263 
6442. 

45 gtgattagaggtggaagaatgaaccgtcaatccactaaaaaagctttgctcataccgttt 
attttaatggtgctcactgctatagcacttgtcgccgtgtggtttatttttaaccaacta 
gtggcaggtattgctacagctatacttattgtgatgattattattagtggcgtgttattg 
agaaaagcatttctaaaaatggataattatgtggatgatttaagtggtcacatctcggca 
agtagtaacaaggcgattaagcacttgcctatagggatgattgtgttagatgaagataat 

50 cacattgagtggatgaaccaatttatgacagatcacattgaaacgaatgtgatttctgaa 
aatgtcaatgaagtcttccctaacatattaaaacaactggaaaaagttcaagaagtagaa 
atagaaaacaacaattattactatcatgtacgatattcagaaaacgagcattgtttatac 
ttctttgatatgactgaaactgaacgtacaaacgaactatatgaagattcaaaaccgatt 
attgcaacaatatttttagataattacgatgaaatcactcaaaacatgaacgatacacaa 

55 cgttctgaaattaactctatggtgacacgtgtgattagtcgttgggcacaggattacaat 
atttacttcaaacgttacaactcagatcaatttgtagcttactttaaccaaaaaatattg 
gctgaattagaagattctaattttgaaatcttaagccaattaagagaaaagagtgtgggt 
taccgcgcacaactaacattaagtattggtgtaggtgaaggtactgagaaccttattgat 
ttaggtgaattatcacaatctggtttagacctcgcgttaggtcgtggtggtgaccaagtt 
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gcaattaagaatatgaacggcaatgtaagattctatggtggtaagactgaccctatggaa 
aaacgtacgcgtgtacgtgcgcgtgtgatttcacatgccctcaaagatattcttactgaa 
ggcgataaagttatcgttatgggacataagcgaccagatttagatgctataggtgcagct 
atcggagtttcgcgctttgcatcaatgaataatttagaggcatttatcgttcttaatgat 
5 tctgatattgatccgacattacgtcgtgttatggacgagattgataagaaaccggaacta 
aaagaacgctttgtaacatcggatgaggcttgggatatgatgacttctaagacgactgtc 
gttgttgtagatacacataaacctgaaatggtcttagatgaaaatgtcttaaataaagca 
aaccgcaaagtagtcattgatcatcatagacgtggcgaaagctttatttcaaatccatta 
cttgtgtatatggaaccttacgctagctcaactgctgagctcgtaacggaattactagaa 

10 , tatcaaccaactgaacagagattgactcgtttagaatcaactgtcatgtatgcaggtatt 
atagtagatacaagaaactttactttaagaacaggttccagaacatttgatgccgcaagt 
tatttacgtgcacatggcgctgatacaatcttaacgcagcatttcttaaaagatgatgtc 
gatacgtatatcaatcgttcagaattgataagaacagttaagatacaagatcaaggtgta 
gccattgcacatggttcagatgataaaatttatcatcctgtaacggttgcacaagctgcc 

15 gacgagttgttaagtttagaaggcattgaagcatcttatgtagtagctaaacgtgaagac 
aacctgatcggtatctcagcacgttcattaggttccataaatgttcaattaacaatggaa 
gcgttaggtggcggtggccatctgacaaatgctgcgacacaaataaaaggtgcgacaata 
gatgaagcaatagaacaattacaacaagcaattacagaacaaatgagtaggagtgaagac 
gcatga 

20 

Sequence 274 

VIRGGRMNRQSTKKALLIPFILMVLTAIALVAVWFI FNQLVAGI ATAI LI VMI I I SGVLL 
RKAFLKMDNYVDDLSGHISASSNKAIKHLPIGMIVLDEDNHIEWMNQFMTDHIETNVISE 
NVNEVFPNILKQLEKVQEVEIENNNYYYHVRYSENEHCLYFFDMTETERTNELYEDSKPI 

25 IATIFLDNYDEITQNMNDTQRSEINSMVTRVISRWAQDYNIYFKRYNSDQFVAYFNQKIL 
AELEDSNFEILSQLREKSVGYRAQLTLSIGVGEGTENLIDLGELSQSGLDLALGRGGDQV 
AIKNMNGNVRFYGGKTDPMEKRTRVRARVISHALKDILTEGDKVIVMGHKRPDLDAIGAA 
IGVSRFASMNNLEAFIVLN DSDIDPTLRRVMDEIDKKPELKERFVTSDEAWDMMTSKTTV 
VVVDTHKPEMVLDENVLNKANRKVVIDHHRRGESFISNPLLVYMEPYASSTAELVTELLE 

30 YQPTEQRLTRLESTVMYAGIIVDTRNFTLRTGSRTFDAASYLRAHGADTILTQHFLKDDV 
DTYINR5ELIRTVKIQDQGVAIAHGSDDKI YHPVTVAQAADELLSLEGIEASYVVAKRED 
NLIGISARSLGSINVQLTMEALGGGGHLTNAATQIKGATIDEAIEQLQQAITEQMSRSED 
A* 

35 Sequence 275 

Contig_04 63_pos_18114_19523, 

is similar to (with p-value 0.0e+00} 

>gp:gp| AF045058 |AF045058_1 Bacillus mojavensis DnaC replicat 
ive helicase (dnaC) gene, partial cds. NID: g3282820. 

40 atggatggaatgtatgagcaaaatcaaatgccgcatagcaatgaagctgaacaatctgtc 
ttaggtgccattattatagatccagaactcattaatactactcaggaagtcttgcttcct 
gagtcgttttatagaggcgcccatcaacatatttttcgagcaatgatgcaectaaatgag 
gataataaagaaattgatgttgtcacattgatggatcaattatcaagtgaaggtagctta 
aacgaagcgggtggccctcaatatctcgccgaactatcgacgagtgtaccgacaacgcga 

45 aatgttcagtactatacggatatcgtttttaaacatgcgttgaaacggaaacttattcaa 
accgctgatagtatagcgaatgatggctataatgatgaattagaattagatacgatttta 
agtgacgccgaacgacgtattttagaactatcttctacaagagaaagtgatggttttaaa 
gatattagagatgtcttaggacaggtatatgaaaccgcagaagaactcgaccaaaatagt 
ggtcaaacaccaggtattccaactggttatcgtgacttagaccaaatgactgctggtttt 

50 aatcgtaatgatttaattattctagcggcacgtccttcagtaggtaagactgcctttgcc 
ttaaatattgcgcaaaaggttgccacacatgaagatatgtatactgtcggtatcttctca 
cttgagatgggcgccgaccaattggcgacacgtatgatttg tagttctggtaacgttgat 
tccaatcgtttaagaactgggacgatgactgaagaagattggagtcgctttacgattgcg 
gtaggtaagctatcacgaactaaaatcttcatagacgatacgccaggtatccgtatcaat 

55 gatttacgttctaaatgtcgtcgactcaaacaagagcacggtcttgatatgattgtgatt 
gattatctacaattgattcaaggaagcggatcacgtttctcagataaccgtcaacaagaa 
gtttcggagatttcacgtacacttaaggcgattgcacgtgaattagaatgtccagttatt 
gcactgagtcagctatcacgtggcgttgaacagcgacaagacaaacgtcctatgatgagt 
gatattcgtgaatctgggtctatagaacaagatgccgacatcgtcgctttcttgtatcgt 
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gatgattattataatcgtggtgaaggtgatgaagatgatgacgatgctgacgatgctggt 
tttgaaccacagacaaatgatgataacggtgaaattgaaatcatcatcgccaagcagcgt 
aatggtccaacaggtactgtgaaacttcactttatgaaacaatacaataaatttacagat 
attgattatgctcatgcagatatgtcataa 

5 

Sequence 27 6 

MDGMYEQNQMPHSNEAEQSVLGAI IIDPELINTTQEVLLPESFYRGAHQHIFRAMMHLNE 
DNKEIDVVTLMDQLSSEGSLNEAGGPQYLAELSTSVPTTRNVQYYTDIVFKHALKRKLIQ 
TADSIANDGYNDELELDTILSDAERRILELSSTRESDGFKDIRDVLGQVYETAEELDQNS 
10 GQT PGI PTGYRDLDQMTAG FNRNDLI ILAARPSVGKTAFALNI AQKVATHEDMYTVG I FS 
LEMGADQLATRMICSSGNVDSNRLRTGTMTEEDWSRFTIAVGKLSRTKIFIDDTPGIRIN 
DLRSKCRRLKQEHGLDMIVIDYLQLIQGSGSRFSDNRQQEVSEISRTLKAIARELECPVI 
ALSQLSRGVEQRQDKRPMMSDIRESGSIEQDADIVAFLYRDDYYNRGEGDEDDDDADDAG 
FEPQTNDDNGEI EI 1 1 AKQRNGPTGTVKLH FMKQYNKFTDI DYAH ADMS * 

15 

Sequence 277 

Cont ig_04 63_pos_l 97 69_0 , 

is similar to (with p-value 3.0e-77) 

>Sp:sp|P29726|PURA_BACSU ADENYLOSUCCINATE SYNTHETASE (EC 6.3 
20 .4. A) (I MP --ASPARTATE LIGASE) . >gp : gp{ M83690 | BACADESYN_1 Bac 
illus subtilis adenylosuccinate synthetase (purA) gene, comp 
lete cds. NID: g!42442. 

gtggttaaacgagaaaaacttggaggtgctcatatgtcatcaatcgtagtagttgggaca 
caatggggagacgaaggtaaaggtaaaataacagactttttagcagagcaagcagacgta 

25 attgctagattttctggtggtaacaatgcgggacatacgattcaatttggtggagaaact 
tacaaattacacttagtaccatcaggtatcttttataaagataaattagcagtaatcggt 
aacggtgtagttgtagatccagtcgcattattaaaagaattagatgggttaaatgaacgt 
ggcatttcaactgacaacctacgcatctcaaatcgcgcacaagtcattttaccttatcac 
ctagctcaagacgaatatgaagaacgtcgtcgtggcgataataaaatcggtacaacgaaa 

30 aaaggtattggcccagcatacgtagataaagcacaacgtatcggtattcgcatggcagat 
ttattagaaaaggaaacattcgaacgccgacttaaagaaaatattgaatataaaaatgca 
tactttaaaggcatgtttaacgaaacttgtccaacattcgatgaaatctttgacgaatac 
tatgctgcaggtcaacgtttaaaagactatgtgacagacacagc 

35 Sequence 278 

VVKREKLGGAHMSSI VWGTQWGDEGKGKITDFLAEQADVIARFSGGNNAGHTIQFGGET 
YKLHLVPSGIFYKDKLAVIGNGVVVDPVALLKELDGLNERGISTDNLRISNRAQVILPYH 
LAQDEYEERRRGDNKIGTTKKGIGPAYVDKAQRIGIRMADLLEKETFERRLKENIEYKNA 
YFKGMFNETCPTFDEIFDEYYAAGQRLKDYVTDTA 

40 

Sequence 279 

Contig_04 63_pos_18985_18680, 

is similar to (with p-value 2.0e-31) 

>gp:gp| AF045058 |AF045058_1 Bacillus mojavensis DnaC replicat 
45 ive helicase (dnaC) gene, partial cds. NID: g3282820. 

atgaagattttagttcgtgatagcttacctaccgcaatcgtaaagcgactccaatcttct 
tcagtcatcgtcccagttcttaaacgattggaatcaacgttaccagaactacaaatcata 
cgtgtcgccaattggtcggcgcccatctcaagtgagaagataccgacagtatacatatct 
tcatgtgtggcaaccttttgcgcaatatttaaggcaaaggcagtcttacctactgaagga 
50 cgtgccgctagaataattaaatcattacgattaaaaccagcagtcatttggtctaagtca 
cgataa 

Sequence 280 

MKILVRDSLPTAIVKRLQSSSVIVPVLKRLESTLPELQIIRVANWSAPISSEKIPTVYIS 
55 SCVAT FCAI FKAKAVLPTEGRAARI IKSLRLKPAVIWSKSR* 

Sequence 281 

Contig_04 63_pos_11710_10706, 

is similar to (with p-value 2.0e-89) 
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>sp:sp| P0997 8 | PHLC_STAAU PHOSPHOLIPASE C PRECURSOR (EC 3.1.4 
.3) (BETA-HEMOLYSIN) (BETA-TOXIN) (SPHINGOMYELINASE). >pir:p 
ir | S15766I S15766 beta-hemolysin - Staphylococcus aureus >gp: 
gp I X134 04 I SAHLB_2 Staphylococcus aureus hlb gene for beta-he 
5 molysin. NID: g46586. 

atgaaacgaggtgtaacaatattgaattggcaacgtaaatgtatactaactactttgttg 
gttttaagtagtttatttttagtattttcgactatcacatatgcgagtgaacgtgatttt 
aaagacagtcttaaaatcactacacataacgtgtatttcttacctactgctatctaccct 
aattggggacaatctcagcgcgctgatttaatttcaaaagcagattacattcaaaatcaa 

10 gatgtcgtgattctaaatgaattatttgataaaaaagcttcaaaaagattgttaacacgt 
ctacattcacagtacccttatcaaacacctatcgttggtaaaggtacagaaggttggcaa 
aatacttctggtacttatagaaaaattaaaaaagtaagtggtggcgttggtattgtgagt 
aaatggcctatcgtacaacaagaacaacatatttataaaaaaggttgtggggctgatatg 
gcaggtaataaaggctttgcctacattaaaattaataagaatggcaaataccaccatatt 

15 atcggaacacatctacaagctgaagatccaacatgctttaaaggaaaagataaagacatt 
agacagagtcaaatgagtgaaattaaacagtttatcaaagacaagaatatccctaaaaat 
gaacccgtctatatcggtggtgacttaaatgtcattaaagattcagatgaatatcaacaa 
atggcaaataacttaaatgtttcattacctactcaattcgatggtaatgcatatagttgg 
gatactagcagtaatagtattgcgaaatataattatcctaaattagaacctcaacactta 

20 gattatattttattagatcatgaccatgcacaaccaagctcatggcataatgatacacat 
agagtgaagtcaccagaatggtccgtgaaatcttggggaaaaacatacaaatacaatgat 
tactcagatcattacccactctcaggctatgcatcaaatgaatag 

Sequence 282 

25 MKRGVTILNWQRKCILTTLLVLSSLFLVFSTITYASERDFKDSLKITTHNVYFLPTAIYP 
NWGQSQRADLISKADYIQNQDVVILNELFDKKASKRLLTRLIISQYPYQTPIVGKGTEGWQ 
NTSGTYRKIKKVSGGVGIVSKWPIVQQEQHI YKKGCGADMAGNKGFAYIKINKNGKYHHI 
IGTHLQAEDPTCFKGKDKDIRQSQMSEIKQFIKDKNIPKNEPVYIGGDLNVIKDSDEYQQ 
MANNLNVSLPTQFDGNAYSWDTSSNSIAKYNYPKLEPQHLDYILLDHDHAQPSSWHNDTH 

30 RVKSPEWSVKSWGKTYKYNDYSDHYPLSGYASNE* 

Sequence 283 

Contig_04 63_pos_4 4 18^3897, 

is similar to (with p-value 7.0e-52) 

35 >sp:sp|P12012|GNTP_BACSU GLUCONATE PERMEASE. >pir : pir j A261 90 
IA26190 gluconate permease - Bacillus subtilis >gp : gp | AB0055 
54 |AB005554_2 Bacillus subtilis genomic DNA, 36 kb region be 
tween gnt and iol operons . NID: g2280496. >gp : gp I J02584 | BACG 
NT_3 B. subtilis (gluconate operon) gntR, gntK and gntP genes 

40 encoding gnt repressor, gluconate kinase and permease, and 
gntZ gene. NID: g!43013. >gp : gp I Z 99124 | BSUB0021_112 Bacillus 
subtilis complete genome (section 21 of 21): from 3999281 t 
o 4214814. NID: g2636442. 

atgcttatcgcagtgatatttgcaatctttacaatgggaatgaagcaacaacggaaaatg 
45 gaagacattatgaaatcagttacgcatgctatttatccaatcggcatgatgttactcatc 
atcggtggtggtggtacatttaaacaagtgctcatcgatggtggcgtaggtgatacaatc 
gctaagatgtttgaaggaacaagcatgtcgcccattttattagcatggattgtagctgca 
gtcttaaggatttcattaggatcagctacagttgctgccgtatcaacaacaggcattgtg 
ttaccacttttagaacattcagatgttaatgtagctttggtcgttcttgcaataggtgca 
50 ggtagcgtaattctctctcacgtcaatgatgctggattctggatgtttaaagaatatttc 
gggctgacagttaaagaaacatttttaacatggtcgttattagagacaattatttcagta 
tctggtattttatttattttatttatcagtttatttgtatag 

Sequence 284 

55 MLIAVIFAIFTMGMKQQRKMEDIMKSVTHAIYPIGMMLLIIGGGGTFKQVLIDGGVGDTI 
AKMFEGTSMSPILLAWIVAAVLRISLGSATVAAVSTTGIVLPLLEHSDVNVALWLAIGA 
GSVILSHVNDAGFWMFKEYFGLTVKETFLTWSLLETIISVSGILFILFISLFV* 

Sequence 285 
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Con t i g_0 4 6 3_pos_3 57 6_2 782, 

is similar to (with p-value 0.0e+00) 

>sp:sp IQ05852 |GTAB_BACSU UTP- -GLUCOSE- 1- PHOSPHATE URIDYLYLTR 
ANSFERASE (EC 2.7.7.9) (UDP-GLUCOSE PYROPHOSPHORYLASE) (UDPG 
5 P) (ALPHA-D-GLUCOSYL-1-PHOSPHATE URIDYLYLTRANSFERASE) (URIDI 
NE DIPHOSPHOGLUCOSE PYROPHOSPHORYLASE). >pir : pir | A4 0650 | A406 
50 OTP — glucose-l-phosphate uridylyltransf erase (EC 2.7.7.9) 
- Bacillus subtilis >gp:gp| L12272 I BACGTABX_1 Bacillus subti 
lis UDP-glucose pyrophosphorylase (gtaB) gene, complete cds . 
10 NID:. g289286. >gp: gp I Z22516 | BSLYTGTA3 B. subtilis lytR, orf 
X, and gtaB genes. NID: g405620. >gp : gp I Z991 22 | BSUB001 9_64 B 
acillus subtilis complete genome (section 19 of 21) : from 35 
97091 to 3809700. NID: g2636029. 

atgccaaaagaaatgttaccaatattagataaaccaacaattcaatatattgtagaagaa 
15 gcttttaatgcaggaatagaagatattattatagtgactggcaagcataaacgtgcaatt 
gaggatcactttgacaatcaaaaagaactagagatagtacttgaaagtaaaggaaaagca 
gatttacttgaaaaagtacaatattcaacagatttagctaatattttttacgtgcgacaa 
aaagaacaaaaagggctaggacatgcaattcatactgcaaaacagtttataggtaacgaa 
ccatttgcagtgttattaggagatgacattgtagagtctgatacaccagctattaaacaa 
20 ttaatggatgtttatgaagaaacaggccattcagtaataggtgttcaagaagtgccagaa 
tctgatacacatcgctatggtgtgattgatccttctgctaaagagggaagtcgatatgaa 
gtacgtcaatttgtagaaaagccgaaacaaggtactgccccgtctaatttagcaatcatg 
ggtcgttatgtattaacaccagaaatttttgattatcttgaaacacaacaagaaggtgct 
ggaaatgaaattcaattaactgatgcgattgaacgaatgaatagcaaacaacaagtgtat 
25 gcatatgattttgagggtaatcgttatgatgttggagaaaaattagggtttgttaaaaca 
acgattgaatatgctttaaaagatccagaaatgagtcaagacttaaaagcattcattaaa 
caactagatatttaa 

Sequence 286 

30 MPKEMLPILDKPTIQYIVEEAFNAGIEDIIIVTGKHKRAIEDHFDNQKELEIVLESKGKA 
DLLEKVQYSTDLANIFYVRQKEQKGLGHAIHTAKQFIGNEPFAVLLGDDIVESDTPAIKQ 
LMDVYEETGHSVIGVQEVPESDTHRYGVIDPSAKEGSRYEVRQFVEKPKQGTAPSNLAIM 
GRYVLTPEIFDYLETQQEGAGNEIQLTDAIERMNSKQQVYAYDFEGNRYDVGEKLG FVKT 
TI EYALKDPEMSQDLKAFI KQLDI * 

35 

Sequence 287 

Contig_04 64_pos_1580_2050, 

is similar to (with p-value 4.0e-35) 

>sp:sp|Q02134 |HIS7_LACLA IMIDAZOLEGLYCEROL- PHOSPHATE DEHYDRA 
40 TASE (EC 4.2.1.19) (IGPD) . >pir : pir | G4 5734 | G45734 HisB - Lac 
tococcus lactis subsp. lactis >gp : gp I U92974 | LLU92974_6 Lacto 
coccus lactis unknown gene, partial cds, and HisC (hisC) , un 
known, HisG (hisG) , unknown, HisB (hisB) , unknown, HisH (his 
h), HisA (hisA), HisF (hisF) , HisIE (hisIE), unknown, unknow 
45 n, LeuA (leuA) , LeuB (leuB) , LeuC (leuC), LeuD (leuD), unkno 
wn, IlvD (ilvD), IlvB (ilvB) , IlvN, IlvC (ilvC) , IlvA (ilvA) 
, AldB (aldB) and aldR (aldR) genes, complete cds. NID: g256 
5137. 

atgttaacgctatttacttttcatagtggattaactttatctattgaggccactggagat 
50 acgtatgttgatgatcatcatataactgaagatataggtatagttattggacaattactt 
cttgaattaataaagactcaacaaagttttacaagatatggttgctcatatgtacccatg 
gatgaggcgcttgctcgaacagtagtggacattagtggtcgtccatatttctcatttaat 
agcaagttgagcgctcaaaaggtaggaacttttgacactgaactagttgaagaatttttt 
agagcattgataattaatgcgcgattaaccgttcacattgacttattaagaggtggaaat 
55 acacatcatgagattgaggcaatatttaaatcttttgcaagagcattaaagatttctctt 
gcacaaaatgaagatggacgtattccatcgtctaaaggagtaattgaatga 

Sequence 288 

MLTLFTFHSGLTLSIEATGDTYVDDHHITEDIGIVIGQLLLELIKTQQSFTRYGCSYVPM 
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DEALARTWDISGRPYFS FNSKLSAQKVGTFDTELVEEFFRALIINARLTVHIDLLRGGN 
THHEIEAIFKSFARALKISLAQNEDGRIPSSKGVIE* 

Sequence 289 
5 Contig_04 64_pos_214 9_2625, 

is similar to (with p-value 1.0e-29) 

>sp:sp|Q02132 |HIS5_LACLA AMIDOTRANSFERASE HISH (EC 2.4.2.-). 

>pir:pir 1 145734 1 145734 HisH - Lactococcus lactis subsp. lac 
tis >gp:gp|U92974 |LLU92974_8 Lactococcus lactis unknown gene 

10 , partial cds, and HisC (hisC) , unknown, HisG (hisG), unknow 
n, HisB (hisB), unknown, HisH (hish) , HisA (hisA) , HisF (his 
F) , HisIE (hisIE), unknown, unknown, LeuA (leuA), LeuB (leuB 
), LeuC (leuC), LeuD (leuD), unknown, IlvD (ilvD), IlvB (ilv 
B), IlvN, IlvC (ilvC), IlvA (ilvA) , AldB (aldB) and aldR (al 

15 dR) genes, complete cds. NID: g2565137. 

gtgcaaaaagctgaagctatcgtacttccaggtgttggacattttcaggatgcgatgcat 
tctatagaagaaaaaagcattaaagatatgcttaaaaatatacatgataaaccgataatt 
ggaatatgtttaggtatgcaattactttttcaacatagcgcagaaggtgacgttagtgga 
ttggaacttgtcccgggaaatatagtgccaatccaatcatctcatcctattcctcatttg 

20 ggttggaatgaattaaagagtacacatcccttactgcaaagtgatgtgtattttgttcat 
tcatatcaagcagaaatgtcagaatatgtagtagcttatgctgactatggtacaaagatt 
ccgggagtcattcaataccgaaattatataggtatccagtttcatcctgaaaaaagtgga 
acgtatggattagagattctaaatcaagcgcttaaaggagggtttattaatgattga 

25 Sequence 290 

VQKAEAIVLPGVGHFQDAMHSIEEKSIKDMLKNIHDKPIIGICLGMQLLFQHSAEGDVSG 
LELVPGNIVPIQSSHPI PHLGWNELKSTHPLLQSDVYFVHSYQAEMSEYVVAYADYGTKI 
PGVIQYRNYIGIQFHPEKSGTYGLEILNQALKGGFIND* 

30 Sequence 291 . 

Contig_04 64_pos_3334_4 077, 

is similar to (with p-value 5.0e-69) 

>sp:sp!034727 |HIS6_BACSU HISF PROTEIN (CYCLASE). >gp:gp|Z991 
21 1 BSUB0018_173 Bacillus subtilis complete genome (section 1 
35 8 of 21): from 3399551 to 3609060. NID: g2635827. >gp:gp|AF0 
171131 AF0 1711 3_4 1 Bacillus subtilis 300-304 degree genomic s 
equence. NID: g2618830. 

gtgattccatgtttagatgttaaagatggacgcgtcgtaaagggtatccagttccagtca 
ttaagagatatcggtaatccagttgatttggctctttattataatgaagccggtgcagat 

40 gaactagtctttcttgatatttcgaagacggaagcaggacatgatcttatgatagaagtg 
atagaagcaacggcaaaacaattatttatccctttaacagtaggaggagggattcaaaat 
ttagatgatattacgcaactattaaatcacggagcagataaaatatcactcaattcaagc 
gctttaaaacatccagaattaattcgacaagcaagcgagaaatttggtcgtcaatgtatt 
tgtattgctattgatagcttttatgataaagacagaaaggattatttctgtactacgcac 

45 ggtggtaaaaaattaactgatgtcagggtatatgattgggtacaagaagtagagctttta 
ggtgctggggaattgcttataactagcatgcatcatgatggaatgaaacaaggttttgat 
attgaacatttagcaaaaattaaacaattagttaatattccgattattgcctctgggggt 
ggaggaaatgcacaacattttgttgaactatttcaacaaacagatgtttcggcaggttta 
gcggcaagtattttacatgatcaagaaactacagtggcagaaattaaagataaaatgcgt 

50 gaaggaggtatccttgtgagatga 

Sequence 292 

VIPCLDVKDGRWKGIQFQSLRDIGNPVDLALYYNEAGADELVFLDISKTEAGHDLMIEV 
IEATAKQLFIPLTVGGGIQNLDDITQLLNHGADKISLNSSALKHPELIRQASEKFGRQCI 
55 CIAIDSFYDKDRKDYFCTTHGGKKLTDVRVYDWVQEVELLGAGELLITSMHHDGMKQGFD 
IEHLAKIKQLVNI PIIASGGGGNAQHFVELFQQTDVSAGLAASILHDQETTVAEIKDKMR 
EGGILVR* 

Sequence 293 
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Contig_04 65_pos_94 67_9787, 

putative peptide of unknown function 

atgtaccggcgatggtatctttttcaactacagctacttgcttaccactttgctttaatg 
ttaacgcagcatgccaagctgcatgaccacttcctaaaaatactacatcatattgtttca 
5 ttaacactcatcctttcttatttttctatgagatgttttaatgtttgctctagttcttca 
aacacatatttcgtttcatcatactcagtcgtatcaaattcttgtggtaagcaacttgaa 
attgcttcaaaaacagcttcttgttgttgttgcccattgtcagttaacgtaattatcaac 
tgtcgtttatcagattgttga 

10 Sequence 294 

MYRRWYLFQLQLLAYHFALMLTQHAKLHDHFLKILHHIVSLTLILSYFSMRCFNVCSSSS 
NTYFVSSYSVVSNSCGKQLEIASKTASCCCCPLSVNVIINCRLSDC* 

Sequence 295 
15 Contig_0465_pos_11617_11937, 

putative peptide of unknown function 

atgagtgataatacaccaccaataaagaacatcctcattacatcaaagatactaatgttt 
ttaaatgcattagattcgaagaagaagattaatcctgaaataggaactaatagtgcgcca 
ataaatatcatacctggtatggcattggcattaccaaatatatttgttaatatccataac 
20 gctaggaaagtgatacctaaagctaaaaatacacgtgagaacacccaaggtcttccccac 
tcctcggaaacttcattaatatgtggcgtcgtacgttttgttccagcaataaatacatca 
tccgcttcatctttggtgtga 

Sequence 296 

25 MSDNTPPIKNILITSKILMFLNALDSKKKINPEIGTNSAPINIIPGMALALPNIFVNIHN 
ARKVI PKAKNTRENTQGLPHSSETSLICGVVRFVPAINTSSASSLV* 

Sequence 297 

Contig_04 65_pos_1554 8_16303, 

30 putative peptide of unknown function 

atgtttaaagtagttatttgtgatgatgaaaggattataagagaaggcttaaagcaaatg 
gttccatgggaggactatcatttcaccactgtttatactgccaaagacggcgtggaagca 
ttgtctttaattcgccaacatcaacctgaactcgtcattactgatatacgaatgcctcga 
aaaaatggtgttgacctactagatgacatcaaagaccttgattgccagattatcatttta 

35 tcgagttatgacgacttcgaatatatgaaagccggtatacaacatcatgttcttgattat 
ttactaaagccagtagaccacactcagttagagcatattctagacatattagttcaaagg 
ttattagaacgcccacattctaccaatgatgacgcggcatatcatactgcctttcaacca 
ttattaaaaattgattacgatgactattatgtcaatcaaattttgtctcaaatcaagcaa 
cattatcacaagaaagtgactgttcttgacttaattaatcctattgatgtaagtgagtca 

40 tacgccatgaggacgtttaaagaacatgtaggcattacgatagttgattatctaaatcgt 
tatcgtattttaaaatcattacatcttttagaccagcactacaagcattatgaaattgct 
gaaaaagtaggtttttctgagtataaaatgttttgctatcattttaaaaaatatttacat 
atgtcaccaagtgattataataagcaatcaaaatag 

45 Sequence 298 

MFKVVICDDERI I REGLKQMVPWEDYH FTTVYTAKDGVEALSLI RQHQ PELV I TDI RM PR 
KNGVDLLDDIKDLDCQIIILSSYDDFEYMKAGIQHHVLDYLLKPVDHTQLEHILDILVQR 
LLERPHSTNDDAAYHTAFQPLLKIDYDDYYVNQILSQIKQHYHKKVTVLDLINPIDVSES 
YAMRTFKEHVGITIVDYLNRYRILKSLHLLDQHYKHYEIAEKVGFSEYKMFCYHFKKYLH 

50 MSPSDYNKQSK* 

Sequence 299 • 

Contig_04 65_pos_14779_13595, 
putative peptide of unknown function 
55 atgcaaacagtcggaattataccttcgccaggtatagcacatcaacatgcaaaaaaaata 
attccaaatgttaaacagttattgtcaaagcgtactaaacatagtcaatggaatttcgac 
atcaaagtcgatctcatgataggatctgcagaggatgtacatgaaagtgttgaaaaagca 
gcacaaattaaagaggaacatcagtgggattacgttgtttgtctgacagatttgcctagt 
atttcagataataaagtggttgtcagcgactttaatagtgacaaacatgttgcaatgcta 
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tcattaccgtcactaggttttattgatttgaagcgcaagctagttaaaacgatgacttca 
ttgattgaacaattatattataatcaaccgaaagacaaaaatgcgccacatccttttgta 
cgcgtgaaggctgtagaacctgacgaagacgccacatcaaaacaacgatatattaatatt 
ttatttatcataagttggattcagttaattggtggactgacacgagcaaatcagccttgg 
5 aaaaacatctttaattttaagaaaatcatttcagttgcctttgcaacaggaacttatgtc 
tcaatattttcaatgccatgggaattaagcgtgatttattcaccgcttcgacttatcata 
ttgatggtgattgctatacttgggatggctggatggctattctatgcgcatcaattgatt 
gaaaagaaaactgctaaatctcagcgtgtatatcgatatatttataattcaaccacactt 
gttacactaagtttgattacactcataaattatgtcattttatatttattgttaatcatc 

10 agtattacactctttgtccctgtggaattatttaatagttggacgagtgcccaatcacaa 
tttacgttctcaaattatatgagattgatttggtttgtatcatcattaggacttttagct 
ggagctatgggatcaactgttgaaaatgaagagaaaatacgtcgtattacttattcttat 
agacaatatcatcgttataaagaagctgagcaagaacaaaaagaacaagaaacttctcgt 
gatgtatcacaacaaaatgtcgaacaacaaacttcaagtaaagatgaaaataatgaacaa 

15 tatgaaggtaaaaaacaaggacatagagaggaggatgacgcatga 

Sequence 300 

MQTVGIIPSPGIAHQHAKKIIPNVKQLLSKRTKHSQWNFDIKVDLMIGSAEDVHESVEKA 
AQIKEEHQWDYVVCLTDLPSISDNKVVVSDFNSDKHVAMLSLPSLGFIDLKRKLVKTMTS 
20 LIEQLYYNQPKDKNAPHPFVRVKAVEPDEDATSKQRYINILFIISWIQLIGGLTRANQPW 
KNIFNFKKIISVAFATGTYVSIFSMPWELSVI YSPLRLI ILMVIAILGMAGWLFYAHQLI 
EKKTAKSQRVYRYIYNSTTLVTLSLITLINYVILYLLLIISITLFVPVELFNSWTSAQSQ 
FTFSNYMRLIWFVSSLGLLAGAMGSTVENEEKIRRITYSYRQYHRYKEAEQEQKEQETSR 
DVSQQNVEQQTSSKDENNEQYEGKKQGHREEDDA* 

25 

Sequence 301 

Con t ig_0 4 6 5_pos_l 358 0_ 12489, 
putative peptide of unknown function 

gtgggtctagtcgtcgctccaggtgttactgaacgccttgcagaaaatctcatacaagaa 

30 atgcctaaaatgttatctacgcattatgatcatcagcaagaatggatttttgatttagtt 
actgatccgcttactggttttgctgaatctgtagatgaaatttttgagaaagtagccgat 
tatcacgataagagacaatgggattatgtgatagcaattacagatttaccgatgtttgct 
gacaaacaagtgatggcattagatattaatatggaaaatggtgcagctatattctcatat 
ccggcatttggctggcgtccagtaaaaaaacgtttcaagcatgcgatttataatattatt 

35 caagaattaaatgaagctgaacaagaaagtcgtaattatgataataataatcaaatagaa 
aattcagtaaaaaaacaatttccgctctctaaaatagacaaagaaacaatatatatgaaa 
gaaacagactcttatcacttaagatatttatcaagttcacgttctagaggcatgtttcgc 
cttgttagtggaatgacatttgcgaataatccattaaatatgatggcaagtttaagtaat 
atagtagctattgcatttactacaggtgcatttggacttgtatttacaacgatgtggcaa 

40 atggcttataacttttcaatgtggcgtttatttggaatttcaattattgcgattattgga 
atgctaatatggataatgatgtcacatgatttatgggaaccagttaataaaagcaaccat 
aagcgtattacttggttatacaatcttacaacaataatgacattgatttttgccattata 
atttattatattattctttatttactattcttaattgctgaaatcgtattattgccatca 
ggatttttaggtcagcaagttggattaaaaggtcctgcaggcattgatttatatttaagt 

45 attccatggtttgcagcttcaatttcgacagttgcaggtgcaataggtgctggtttactt 
aatgatgaactcattaaagaaagcacatatggatatcgtcagcgtgtaagatacgaagaa 
caacgtcgataa 

Sequence 302 

50 VGLVVAPGVTERLAENLIQEMPKMLSTHYDHQQEWIFDLVTDPLTGFAESVDEIFEKVAD 
YHDKRQWDYVIAITDLPMFADKQVMALDINMENGAAIFSYPAFGWRPVKKRFKHAIYNII 
QELNEAEQESRNYDNNNQIENSVKKQFPLSKIDKETIYMKETDS YHLRYLSSSRSRGMFR 
LVSGMTFANNPLNMMASLSNIVAIAFTTGAFGLVFTTMWQMAYNFSMWRLFGISIIAIIG 
MLIWIMMSHDLWEPVNKSNHKRITWLYNLTTIMTLI FAIII YYIILYLLFLIAEIVLLPS 

55 GFLGQQVGLKGPAGIDLYLSIPWFAASISTVAGAIGAGLLNDELIKESTYGYRQRVRYEE 
QRR* 

Sequence 303 

Contig_04 65_pos_l 164 8_11019, 
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putative peptide of unknown function 

atgttctttattggtggtgtattatcactcattagtacaatgattttatatcaatttgtt 
acatttagtactgaatcgcaatattatggaattatgactataacagatgcgttcatagta 
ggatttgttgaagagcttggaaaagcaactgttgttattttatttattaattatttaaaa 
5 acaaataaaatactcaatggattacttatcggtgctgctgttggtgcagggttcgcggtg 
tttgaatcagctggttatatctttaggtttggatttaatttatttgatggagttaataat 
attactgaaatcactatacaaagaggttggacagctttaggtagtcacctcgtttgggca 
gctattgttggtgctgcggcagtaatagtgaaagaaacaaagcatttcgaatgggcgaat 
atcatcgataaacgttttatatttttcttttttgtggcagtgacattacacggaatatgg 
10 . gatacggaaataacacttttaagtagtggttatttaaaatatatcttattaattatgatt 
gcatggctatttatatttatacttatgaaagcagggctaactcaggtgaatcagttgcgt 
gatgaatacaatcgtttagaggaaaggtga 

Sequence 304 

15 MFFIGGVLSLISTMILYQFVTFSTESQYYGIMTITDAFIVGFVEELGKATVVILFINYLK 
TNKILNGLLIGAAVGAGFAVFESAGYIFRFGFNLFDGVNNITEITIQRGWTALGSHLVWA 
AIVGAAAVIVKETKHFEWANIIDKRFIFFFFVAVTLHGIWDTEITLLSSGYLKYILLIMI 
AWLFIFILMKAGLTQVNQLRDEYNRLEER* 

20 Sequence 305 

Contig_0465_pos_11013_10264, 
putative peptide of unknown function 

atgaaattctgccctcattgtggaaatccgataaaaaaggaacagtcattttgtaataaa 
tgtggaaaacatttaaagacatcgacacaaagaaaaagtgaaaatcaaattgaacatatg 

25 cgtgaacagcaatcgtatatttcttgtgaggaaagacaacatcatgattcaacattttat 
aaagaacaaaaacatactggttggctaattgtattatcaattatatttgtcttgttgata 
gcagcgctattgtatggtgcgtactatacttacaatcattatattagtgatgagcaaagt 
ca t caa a caa cacagtct cagcaa t ca aa tga a agtggtcaaa a taagga tea a tccact 
ggtccaagcattgatgtttttagtgatgactttgatcaaggttatatgaagtcagcttca 

30 acaagtggatatagaggtgtttataatggaatgacacgtgaagaagttgaagataaattt 
ggaacatccaatggttctgtagaaagtttgaagtggagttacgaaaaatatggtgattta 
gctgtagcctacgatgataatgaagttgttagcgtaggtgtagcacctaatcatatttca 
gaagatcaatttttaagtatgtataatgaaccggatgatagaaattcaagccaactcatt 
tatgatagtaacaaagataatgacttctctgtgttagctaatgttaaaaatggatatgtt 

35 actgtcattgaaaatgtaaatcaaatttaa 



Sequence 306 

40 MKFCPHCGNPIKKEQSFCNKCGKHLKTSTQRKSENQIEHMREQQSYISCEERQHHDSTFY 
KEQKHTGWLIVLSIIFVLLIAALLYGAYYTYNHYISDEQSHQTTQSQQSNESGQNKDQST 
GPSIDVFSDDFDQGYMKSASTSGYRGVYNGMTREEVEDKFGTSNGSVESLKWSYEKYGDL 
AVAYDDNEVVSVGVAPNHISEDQFLSMYNEPDDRNSSQLI YDSNKDNDFSVLANVKNGYV 
TVIENVNQI* 

45 

Sequence 307 

Cont ig_0 4 6 5_po s_7 1 7 7_67 1 0 , 

is similar to (with p-value 8.0e-58) 

>gp:gp| AJ000974 |BSPYREYLO_2 Bacillus subtil is pyrE to yloA g 
50 ene region. NID: g2462954. >gp: gp I Z99112 I BSUB0009_28 Bacillu 
s subtilis complete genome (section 9 of 21) : from 1598421 t 
o 1807200. NID: g2633902. 

gtgaaagataaatatccgcaattacgcattaaaatgaaaaaaccggaacttacgttagag 
gaacaaggtgagaaatataatcctgctttatggaagaatgatcctaaccaatgttgctac 
55 atacgcaagattaaaccactagaagacgtattatctggtgctgtagcttggatatcaggt 
cttagacgagcacaatcaccaacacgagcacatacaaatttcattaacaaagatgaaaga 
tttaagtcaattaaagtgtgtcccttaatctattggacagaagaagaagtatggtcttat 
atacgtgataaggatttaccatataatgaattacatgatcaaaattatccaagtattggt 
tgtattccatgtacatcacccgtatttgattctaatgattcacgtgctggtcgttggtcc 
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aattctagtaagactgaatgtggattacatgtagctgataaaccataa 
Sequence 308 

VKDKYPQLRIKMKKPELTLEEQGEKYNPALWKNDPNQCCYIRKIKPLEDVLSGAVAWISG 
5 LRRAQS PTRAHTNFINKDERFKS I KVCPLI YWTEEEVWS YI RDKDLPYNELHDQN Y PS IG 
CIPCTSPVFDSNDSRAGRWSNSSKTECGLHVADKP* 

Sequence 309 

Contig_04 65_pos_2817_2065, 

10 is similar to (with p-value 1.0e-71} 

>sp:sp| P29928 |SUMT_BACME UROPORPHYRIN-III OMETHYLTRANSFERAS 
E (EC 2.1.1.107) (UROGEN III METHYLASE) (SUMT) (UROPORPHYRIN 
OGEN III METHYLASE) (UROM). >pir : pir I A4 2479 I A4 24 79 S-adenosy 
1-L-methionine uroporphyrinogen III methyltransf erase - Baci 

15 llus megaterium >gp : gp I M62881 | BACCOBA_l Bacillus megaterium 
S-adenosy-L-methionine : uroporphyrinogen III methyltransf eras 
e (COBA) gene, complete cds . NID: gl42694. 

gtggttcatatggggaaagtatatttagttggagctggacctggtgatccagaattaata 
acgttaaaaggtttaaaagccattaaagaagccgatgtcatcctttatgaccgacttgta 

20 aataaagaaatacttaattatgcttctccttctactaagttcttctattgcggtaaggat 
cctcacaggcactccttaccgcaggaagaaacaaataaaatgatggtaaccttagccaaa 
aaagggcacatagttacacgtttaaagggtggcgatccatttgtttttggacgtggcgga 
gaagaagcagaggaattagcatgtcataatatccactttgaaattatacctggaattcca 
gtaacacatcgtgattatagttcttctgtagcatttgtaactgcagtgaataaacctggt 

25 atggataaaggcaaatactggcaacatttggccaatggtcctgaaactttatgtatttat 
atgggggttaagagactcagtgaaatttgtgagttgttaatacaatatggtcgttcgtca 
gaaacaccagtagctctcgtgcatatgggaacgtcaaaacagcaaatgacagtgactggg 
acactcgatacaattcaagaacgagcacatcatattcagaatccagcaatgat tattgta 
ggcgaagtggttaagatgagagaaaaaattaattggtttgtagaacaggcaactgttcaa 

30 aatgaaacgttaacggaaatgtcatcaacttag 

Sequence 310 

VVHMGKVYLVGAGPGDPELITLKGLKAIKEADVILYDRLVNKEILNYASPSTKFFYCGKD 
PHRHSLPQEETNKMMVTLAKKGHIVTRLKGGDPFV FGRGGEEAEELACHNIHFEI IPGI P 
35 VTHRDYSSSVAFVTAVNKPGMDKGKYWQHLANGPETLCI YMGVKRLSEICELLIQYGRSS 
ETPVALVHMGTSKQQMTVTGTLDTIQERAHHIQNPAMI IVGEVVECMREKINWFVEQATVQ 
NETLTEMSST* 

Sequence 311 
40 Contig_04 65_pos_1984_1379, 

is similar to (with p-value 4.0e-19) 

>gp:gp| AJ000974 (BSPYREYL08 Bacillus subtilis pyrE to yloA g 
ene region. NID: g2462954. >gp : gp I Z991 12 | BSUB0009_34 Bacillu 
s subtilis complete genome (section 9 of 21): from 1598421 t 

45 o 1807200. NID: g2633902. 

atgcccttaatgattgatttaagtaacaagaaagtcgtcattgtaggtggaggtaaagtg 
gcaacacgtcgtgctaaaactttattagcttatacaaaacatattcatgttgtaagtcca 
acaattaccgatacattacaaaaatatctagaaacgaagcaaatcacttatgaaaagaaa 
cacttcgaaccacaagatgttgagaatgctgatgtggtcatcgcggctactaatcaatct 

50 gatgttaacaacgatgtgggggcagctttgtctaagaacgtattatttaatcatgcagga 
caagcagacctaggtaatgtaacgttccctaatttcttaaaaagagataaattaacaata 
agtgtatcaactgatggtgcaagtcctaaattaggtcaacgaattattaaagatttaaaa 
gatacatacaatgaagactattcaatgtatattcagtttttatatgaaagtagacaat at 
attaaatcacttaaaattgagccatctgataaacaagcgttactcgagcaaattttgtca 

55 gacaaa tat ttagatgagaagaagcaacaagatt teat ccgatggctaaaatcacaagtc 
aaatga 

Sequence 312 

MPLMIDLSNKKWIVGGGKVATRRAKTLLAYTKHIHVVSPTITDTLQKYLETKQITYEKK 
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HFEPQDVENADVVIAATNQSDVNNDVGAALSKNVLFNHAGQADLGNVTFPNFLKRDKLTI 
SVSTDGASPKLGQRIIKDLKDTYNEDYSMYIQFLYESRQYIKSLKIEPSDKQALLEQILS 
DKYLDEKKQQDFIRWLKSQVK* 

5 Sequence 313 

Contig_04 65_pos_1292_4 59, 

putative peptide of unknown function 

atgggatttggcgcttcatcgtcatcaatattattaacttacggtatagcaccggcagta 
gtgtcagcaaccgttcatttttctgaaattgcaacaacagctgcatctgggacatcacat 

10 tggagatttgataatgttcataaaccaacaatgttgaagttagctatacctgggtcaata 
agcgcctttatcggtgcaggtgttttgacatttattcatggtgattatattaaaccattc 
attgctttattcttgttaagtatgggattttatattttgtatcaatttctatttaaacgt 
gcacatgaacatcatcatcatgtgggaaatttgagtagttttaaagtaattccacaaggt 
tttgtggcaggatttttagacgcaatcggtggtggtggttggggaccggttaatacgccg 

15 ctcctgctttcaagtaaaaaaattcaaccacgatatgcgattggaacagtctcagcaagt 
gaattttttgttacgtcatctgccgctttaagtttcattatctttttaggagtcactcaa 
attaattggtttgctgtaattgctttaagtctcggtggaatggtagcagcacctatttca 
gcgtatttagttaaagtgttacccattaacattcttgcaatttgtgtcggtggtttaatt 
atttttacaaatagtaatgcattattaagctattttgtaaaagataacactatttcaaat 

20 acagttcgattcattattattcttgcaattattattttgcttgtttttcaagtcgttcga 
aacaagaaattgtctttttcttataagaaaagccgagtaaacaaatataattaa 

Sequence -314 

MG FGASSSSILLTYGIAPAVVSATVHFSEI ATTAASGTSHWRFDNVHKPTMLKLAIPGSI 
25 SAFIGAGVLTFIHGDYIKPFIALFLLSMGFYILYQFLFKRAHEHHHHVGNLSSFKVI PQG 
FVAGFLDAIGGGGWGPVNTPLLLSSKKIQPRYAIGTVSASEFFVTSSAALSFI I FLGVTQ 
INWFAVIALSLGGMVAAPISAYLVKVLPINILAICVGGLIIFTNSNALLSYFVKDNTISN 
TVRFI I ILAI I ILLVFQVVRNKKLSFSYKKSRVNKYN* 

30 Sequence 315 

Contig_04 65_pos_0_4 35 , 

is similar to (with p-value 7.0e-26) 

>gp:gp| AJ000974 | BSPYREYLO_4 Bacillus subtilis pyrE to yloA g 
ene region. NID: g2462954. >gp: gp| Z99112 | BSUB0009_30 Bacillu 
35 s subtilis complete genome (section 9 of 21) : from 1598421 t 
o 1807200. NID: g2633902. 

atgtctaacaatgaaacaataaccaattatacaattaaacctcatggaggagaactcatc 
aatcgtgttgttgaaggaaacgaacgtgaacgtttgattgaggaagcattaaattttaaa 
ccgattactttaaatccttggggaatatcggatctagagctcataggtattggcggattt 
40 agtccccttacaggatttatgaacaaggaagactacactaaggttatagaggaaacacat 
ttaagcaatggcttagtttggagtattcctatcactttacctgtaacagaatccgaagca 
gataaacttgaaataggtgatgatattgctttatatggtgaagatggtcagttatatgga 
acgcttaaattagaagaaaagtacacatatgataaagaaaaagaagcgcgtttggtgtac 
ggaactactgaagaa 

45 

Sequence 316 

MSNNETITNYTIKPHGGELINRVVEGNERERLIEEALNFKPITLNPWGISDLELIGIGGF 
SPLTGFMNKEDYTKVIEETHLSNGLVWSI PITLPVTESEADKLEIGDDIALYGEDGQLYG 
TLKLEEKYTYDKEKEARLVYGTTEE 

50 

Sequence 317 

Contig_04 66_pos_3615_2260, 

is similar to (with p-value 0.0e+00) 

>gp:gp| Y09570 | SAFEMD_1 S. aureus femD gene. NID: gl684748. >g 
55 p : gp | Y 1 5 4 7 7 | S AARGFEMD_4 Staphylococcus aureus argl, glmM gen 
es and ORF1 and ORF2 . NID: g3892891. 

atgggaaaatattttggtactgatggtgttcgtggtgtcgctaaccaagaactcacacct 
gaattggcttttaaactaggtagatacggaggatatgttctcgcacataataagggtgaa 
aagcatcctcgagttttagtaggaagagatacaagagtttcaggagaaatgctagaatct 
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gcattaattgctggtttaatttcaattggcgcagaagtgatgcgcttaggtgttatttca 
acaccgggtgtggcttatttgactaaagaaatggaagcagcattaggtgttatgatttct 
gcgtcacataatccggttgctgataatggaattaaattttttggttcagatggctttaaa 
ttgtcagatgatcaagaaaatgaaattgagcaattattagatcaaaccaatcctgattta 
5 ccacgaccagtaggagaggatattgtacattattcagattattttgaaggtgcacaaaag 
tatctaagttatcttaaatcaactgttgatgttaattttgagggtcttaaaattgtatta 
gatggtgcaaacgggtcaacttcttctttagccccattcttgtttggcgatttagaagcg 
gatactgagacaattggatgtaatccagatggttataacattaatgaacaatgtggctct 
actcatccagaaaaattagctgaagctgtgttagaaactgaaagtgactttggtttagct 

10 tttgatggagatggcgatcgaattattgcggtagatgaaaatggacaaattgtagatgga 
gatcaaattatgttcattattggtcaagagatgtataaaaaccaagaactcaatggaaat 
atgatagtttcgacagtaatgagtaaccttggtttctacaaagctctagaaaaagaaggt 
attcagtcaaacaaaactaaagttggagatcgctatgttgtcgaggaaatgagaagagga 
aattataatcttggtggtgaacaatccggtcatatcgtattaatggattacaatactact 

15 ggtgatggattattaacgggtgttcagttggcttccgttattaaaatgagtggtaaaact 
ctaagcgagttagcttctcaaatgaaaaagtacccacaatctttaattaatgtgagagtg 
actgacaaatatcgtgttgaagagaatattcatgttcaagagataatgacgaaagttgaa 
acagagatgaatggtgaaggaagaattcttgttcgtccttctggaactgaacctttagta 
cgtgtaatggttgaggctgcaactgacgcggatgctgaaagatatgctcaaagtatcgct 

20 gacgttgttgaagacaaaatgggcttagataaataa 

Sequence 318 

MGKYFGTDGVRGVANQELTPELAFKLGRYGGYVLAHNKGEKHPRVLVGRDTRVSGEMLES 
ALIAGLISIGAEVMRLGVISTPGVAYLTKEMEAALGVMISASHNPVADNGIKFFGSDGFK 

25 LSDDQENEIEQLLDQTNPDLPRPVGEDI VHYSDYFEGAQKYLSYLKSTVDVNFEGLKIVL 
DGANGSTSSLAPFLFGDLEADTETIGCNPDGYNINEQCGSTHPEKLAEAVLETESDFGLA 
FDGDGDRIIAVDENGQI VDGDQIMFIIGQEMYKNQELNGNMIVSTVMSNLGFYKALEKEG 
IQSNKTKVGDRYVVEEMRRGNYNLGGEQSGHIVLMDYNTTGDGLLTGVQLASVIKMSGKT 
LSELASQMKKYPQSLINVRVTDKYRVEENIHVQEIMTKVETEMNGEGRILVRPSGTEPLV 

30 RVMVEAATDADAERYAQS IADWEDKMGLDK* 



Sequence 319 
35 Contig_04 66_pos_104 0_24, 

is similar to (with p-value 0.0e+00) 

>sp:sp|P39754 |GLMS_BACSU GLUCOSAMINE-- FRUCTOSE- 6 -PHOSPHATE A 
MINOTRANSFERASE ( ISOMERIZING) (EC 2.6.1.16) (HEXOSE PHOSPHATE 
AMINOTRANSFERASE) ( D-FRUCTOSE-6- PHOSPHATE AMI DOT RAN SFE RASE 
40 ) (GFAT) (L-GLUTAMINE- D-FRUCTOSE-6- PHOSPHATE AMI DOTRANSFERAS 
E) (GLUCOSAMINE-6-PHOSPHATE SYNTHASE) . 

atgttacaaactacaaaccaatacaaagagatacatgaccatgaaatagttattgttaag 
cgagacacagtagaaattaaagatcttgaggggcacattcaacaacgtgatacgtatacg 
gcagaaatagatgctgctgatgcagaaaaaggcgtatatgatcattacatgt taaaagaa 

45 attcatgaacagcctgcagtgatgcgtcgcattattcaagaatatcaagatgaaaaaggt 
aatttaaaaatcgattcagagattattaatgatgtagcagatgctgatcgtatttacatc 
gttgcagctggtactagttatcatgctggattggttggtaaagaatttattgaaaaatgg 
gcaggtgtacctactgaggttcatgtagcttctgaatttgtatataatatgccacttctt 
tctgaaaaaccactatttatttatatttcacaatctggtgaaacagctgatagtcgtgct 

50 gtattagttgaaacaaataagttaggtcacaaatcattaacaattactaatgttgctggt 
tcaacattatcacgtgaagcggatcatacattacttttacatgctggacctgagattgca 
gtcgcatctacaaaagcatatacagcgcaaattgctgttttatctatcttatctcaaatt 
gttgctaaaaatcatggtcgtgaaaccgatgttgatttattaagagaactagctaaggtt 
actacagctattgaaacaattgttgacgatgcacctaagatggagcaaattgcaacggat 

55 ttcttaaaaactactcgtaatgcattcttcattggacgaacaattgattataatgttagt 
ttagaaggtgcattaaaattaaaagaaatttct tatattcaagctgaaggatttgcaggt 
ggggaattaaagcacggaacaatcgct ttgat tgaagatggcacacctgttataggttta 
gctacacaagaaaacgttaatctatcaattcgtggaaatatgaaagaagtactttag 
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Sequence 320 

MLQTTNQYKEIHDHEIVI VKRDTVEIKDLEGHIQQRDTYTAEI DAADAEKGVYDHYMLKE 
IHEQPAVMRRIIQEYQDEKGNLKI DSEIINDVADADRI YI VAAGTSYHAGLVGKEFIEKW 
AGVPTEVHVASEFVYNMPLLSEKPLFIYISQSGETADSRAVLVETNKLGHKSLTITNVAG 
5 STLSREADHTLLLHAGPEIAVASTKAYTAQIAVLSILSQIVAKNHGRETDVDLLRELAKV 
TTAI ET I VDDAPKMEQIAT DFLKTTRNAFFIGRTI DYNVSLEGALKLKEI S Y I QAEGFAG 
GELKHGTIALIEDGTPVIGLATQENVNLSIRGNMKEVL* 

Sequence 321 
10 Cont ig_04 67_pos_8 4 35_972 4 , 

is similar to (with p-value 0.0e+00) 

>sp:sp| P3094 9|GSA_BACSU GLUTAMATE- 1-SEMIALDEHYDE 2, 1-AMINOMU 
TASE (EC 5.4.3.8) (GSA) ( GLUTAMATE- 1-SEMIALDEHYDE AMI NOTRANS 
FERASE) (GSA- AT) . >pir : pir I D42728 | D4 2728 glutamate-l-semiald 
15 ehyde 2 , 1-aminomutase (EC 5.4.3.8) - Bacillus subtilis >gp:g 
p|M57676| BACHEMAXC_6 Bacillus subtilis hemAXCDBL gene cluste 
r. NID: gl43034. >gp:gp| Z99118 I BSUB0015_77 Bacillus subtilis 
complete genome (section 15 of 21): from 2795131 to 3013540 
. NID: g2635200. 

20 atgaaatttactgaaagcgaacgtcttcagcaactttctaatgaatatattttgggaggt 
gtgaattcaccttcaaggtcgtataaagcagtaggtggtggggcaccagtagtaatgaaa 
gaagg tea tggcgct tact tatatgatgtagatggtaataaatatatcgactatctacaa 
gcttatggtccaataataactggtcatgcacacccacatatcaccgaagctatccaagat 
caagcagcaaaaggcgtactttatggtacccctactgaattagaaataaatttctcaaaa 

25 aaacttagagaagcagttccttctttagaaaagattcgtttcgtgaactctggtactgaa 
gcagttatgacaacaattagagttgctcgtgcttatactaaaagaaacaaaatcattaag 
tttgcaggctcttatcatggtcattctgatttagttttagtggcagctggaagtggacct 
tctcaacttggttctccagattctgctggtgtcccccaaagtgttgcacaagaggttatt 
acagtaccgtttaatgatatagaatcatatagagaagctattgattattggaaagacgac 

30 attgctgcagtattagtagagccgattgtgggtaattttgggatggtcatgccacaacca 
ggtttcttagaagaagtaaataaaatttctcatgataatggaacattagttatctatgat 
gaagttatcactgcttttcgtttccattatggtgcagctcaagatttattaggtgttaaa 
ccagacctcactgcttttggtaagattgttggcggtggtttaccaattggaggctatggt 
ggtcgacaagatattatggagcacgttgcaccattaggtccagcttatcaagcaggaaca 

35 atggccggtaacccgttatctatgagagcaggtattgct ttattagaggtacttgaacaa 
gaaggtgtttatgataaacttgatcaattaggtcgtcgtcttgaagaagggttacaaaaa 
ttaatagataagcatcatattacagcaacaataaatcgaatctatggctcactgacattg 
tatttcacaaatgaaaaagttacacattatgaacaagttgaaaactctgatggagatgct 
ttcgctcaattctttaaattaatgttgaaccaaggcattaatctcgcgccttctaaattt 

40 gaagcatggttcttaactacagaacatactgaagaagatatcgatcgcacactagaagca 
getgattatgeatttagtaaaatgaaataa 

Sequence 322 

MKFTESERLQQLSNEYILGGVNSPSRSYKAVGGGAPVVMKEGHGAYLYDVDGNKYIDYLQ 
45 AYGPIITGHAHPHITEAIQDQAAKGVLYGTPTELEINFSKKLREAVPSLEKIRFVNSGTE 
AVMTTIRVARAYTKRNKIIKFAGSYHGHSDLVLVAAGSGPSQLGSPDSAGVPQSVAQEVI 
TVPFNDIESYREAIDYWKDDIAAVLVEPIVGNFGMVMPQPGFLEEVNKISHDNGTLVI YD 
EVITAFRFHYGAAQDLLGVKPDLTAFGKIVGGGLPIGGYGGRQDIMEHVAPLGPAYQAGT 
MAGNPLSMRAGIALLEVLEQEGVYDKLDQLGRRLEEGLQKLIDKHHITATINRIYGSLTL 
50 YFTNEKVTHYEQVENSDGDAFAQFFKLMLNQGINLAPSKFEAWFLTTEHTEEDIDRTLEA 
ADYAFSKMK* 

Sequence 323 

Contig_04 67_pos_10082_11125, 
55 putative peptide of unknown function 

atgtctatcgcttctttacttcctgataatattggtctaaaaacgttagcaggtgtcagt 
gcagtagttgccatgcaaccgagtgtttatcgctcaatcaaaactgtttctgaacaagct 
attggtaatgtgattggtgcattacttgcagtaacaatggtaacgatattcaataataat 
ttcattatcatgggcgttaccgttattttactcattgcaattttgttccaatttaatctt 
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gcccatgtagcaacacttgcaagcgtaactgcacttataattatggggcaacacactggt 
tctttctatgttgtcgcattttttagatttgtactagtgatgattggtgtattgagttct 
tctgttgtcaatctaatttttttacctcctaagtttgaaacaaaaatttattataattct 
atgaatatttcttctgatatatttgtttggtttaaacttgtactcaatgacacatcagaa 
5 tttcataatattaaacaggatggtgatcaactaaactcacgcatcaataaattagaaaag 
attttcgactattacaatgaagaaagaccattaacaaaaaaacatatttatcaacagaat 
agaaaaaaaatactatttagagaagtagttagaacgaccagacaagcatatgaagtgcta 
aaacgaatgtcacgatatcaaaatgatttatatcaactaaataatcaattacttttacaa 
atcaaattagaacttgattcattagttactttacatgaacaaatatttaagagtctatca 
10 aaaaaagctagatatgatgtcactcaattagattatgaagttgacaatcctcagaagaaa 
aacttgatggatgcttttcagcaagaattaattaaaaacccacatcagacgcaatattct 
tatagcaatatgatgcaaattattgctgaaattgaagaatacagatatcaacttgaacac 
ttagatagaatccgtttaagtttctttacctatcaccgttctgatactgatatagacatt 
tcaaatgaggactttgacttataa 

15 

Sequence 324 

MSIASLLPDNIGLKTLAGVSAVVAMQPSVYRSIKTVSEQAIGNVIGALLAVTMVTIFNNN 
FIIMGVTVILLIAILFQFNLAHVATLASVTALIIMGQHTGSFYWAFFRFVLVMIGVLSS 
SVVNLI FLPPKFETKIYYNSMNISSDI FVWFKLVLNDTSEFHNIKQDGDQLNSRINKLEK 
20 IFDYYNEERPLTKKHI YQQNRKKILFREVVRTTRQAYEVLKRMSRYQNDLYQLNNQLLLQ 
IKLELDSLVTLHEQIFKSLSKKARYDVTQLDYEVDNPQKKNLMDAFQQELIKNPHQTQYS 
YSNMMQI IAEI EEYRYQLEHLDRI RLSFFTYHRSDTDI DI SNEDFDL* 

Sequence 325 
25 ContigJ)4 67_pos_12 931_11285, 

is similar to (with p-value 0.0e+00) 

>sp:sp|P4 58 61|YWJA_BACSU HYPOTHETICAL ABC TRANSPORTER ATP-BI 
NDING PROTEIN IN AC DA 5 ' REGION . >pir : pir I S554 15 | S554 15 ABC t 
ransporter - Bacillus subtilis >gp : gp I Z4 9782 I BSDNA320D_2 B.s 
30 ubtilis chromosomal DNA {region 320-321 degrees). NID: g8537 
52. >gp:gp| Z99123 | BSUB0020_20 Bacillus subtilis complete gen 
ome (section 20 of 21): from 3798401 to 4010550. NID: g26362 
40. 

atgctcatacctttattgattaaatatgctatagatggcgtgattaataatcattcgctt 

35 acaaatcaagaaaaatttagtcaccttggtgtagcaataggaattgcattatttattttc 
ttaattgttcgcccgccgattgagtttattagacaatatttagctcaatggacaagtaat 
aaaatactatatgatattcgtaaacaattgtataatcacttgcaagcactaagtgttcgc 
ttttatgcaaataatcaagtcggtcaagtcatttcaagagtgattaatgatgtcgaacaa 
acaaaagactttattcttactggattgatgaatatctggcttgactgtataacgattatt 

40 atcgcactttctattatgttcttccttgatgtaaaattgacgtttgctgcaatttttatt 
tttccattttatattttaactgtttattttttctttggaagattacgaaaacttacacgt 
gtgcgctcacaagctctagcagaagtacaaggtttcttacatgagcgggttcaaggaatg 
tctgttattaaaagttttgctattgaagacaatgaagctaaaaattttgataaccataac 
aagaattttttacaacgagccttccaacatacaagatggaacgcatattcttttgctgct 

45 attaatactgttacagatttaggcccaataattgtgattggcgtgggttcatatttggca 
attacaggatcgattactgtcggaactctagcagcatttgtcggttatctagaacaatta 
tttggaccacttagaagactagtatcttcatttactacacttacacaaagttttgcatct 
atggacagagtatttcagttaatggatgaggattacgacatcaaaaatggcattggagca 
cagccaattaaaatcagtaagggtcaaattgatttaaaacatgtgagtttcaaatataat 

50 gaaaatgaaaaagaagtattacacgatattaatttaacaattaacaaaggcgaaactgta 
gcatttgtaggtatgagtggtggtggaaaatctactttgattaatcttataccaagattt 
tatgatgttactcaaggtgaaatacttatcgatcatcataatgttaaagatttcctaact 
ggtagtttaaggaatcaaataggcttagtacaacaagataatattcttttttctgatacg 
gttaaggagaatattttgttgggtaggcctgatgcgactgatgatgaagtcgtagaagct 

55 gcaaaaatggcgaatgcccatgattttatttcaaatttaccgaatggatatgatactgaa 
gtaggagaacgaggagttaaattatctggtggacaaaaacaaaggttgtcaattgcacgt 
atctttttaaataatcctcctgttttaatattagatgaagcaacaagtgcattggattta 
gagagtgaagctattattcaagaagcacttgatgttttaagtaaggatagaacaacatta 
attgttgcacatcgtctatctaccattactcatgcagatagaatagttgtaatggaaaat 
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.ggacgaattgttgagactggcacacaccaacaattaattaataaacgtggtgcttatgag 
catctttatagtattcaaaatttataa 

Sequence 326 

5 MLIPLLIKYAIDGVINNHSLTNQEKFSHLGVAIGIALFIFLIVRPPIEFIRQYLAQWTSN 
KILYDIRKQLYNHLQALSVRFYANNQVGQVISRVINDVEQTKDFILTGLMNIWLDCITII 
IALSIMFFLDVKLTFAAIFIFPFYILTVYFFFGRLRKLTRVRSQALAEVQGFLHERVQGM 
SVIKS FAIEDNEAKNFDNHNKNFLQRAFQHTRWNAYSFAAINTVTDLGPIIVIGVGSYLA 
I TGS ITVGTLAAFVG YLEQLFGPLRRLVSS FTTLTQS FASMDRVFQLMDEDYDI KNGIGA 
10 QPIKISKGQIDLKHVSFKYNENEKEVLHDINLTINKGETVAFVGMSGGGKSTLINLIPRF 
YDVTQGEILIDHHNVKDFLTGSLRNQIGLVQQDNILFSDTVKENILLGRPDATDDEVVEA 
AKMANAHDFISNLPNGYDTEVGERGVKLSGGQKQRLSIARIFLNNPPVLILDEATSALDL 
ESEAIIQEALDVLSKDRTTLIVAHRLSTITHADRIWMENGRIVETGTHQQLINKRGAYE 
HLYSIQNL* 

15 

Sequence 327 

Contig_04 67_pos_6847_6395, 

is similar to (with p-value 6.0e-58) 

>sp:sp| P71086I FUR3_BACSU FERRIC UPTAKE REGULATION PROTEIN HO 
20 MOLOG 3. >gp:gp|Z99108|BSUB0005_141 Bacillus subtilis comple 
te genome (section 5 of 21): from 802821 to 1011250. NID: g2 
633055, >gp:gpl Z82044 |BSZ82044_9 B. subtilis 25 kb genomic DN 
A segment (from sspE to katA) . NID: gl673387. 
gtgagtgcggaacttgaatctattgatcatgaacttgaagagtcaattgcttcattaaga 
25 aaagcgggcgttcgcattacaccccaaagacaagcaattatgcgttatcttatatcttca 
cattcacatccaacagcagatgaaatatatcaagcactttcacctaaatttcctaatata 
agtgttgctactatctataataatctaagagtttttaaagatattggtatagtcaaagag 
ttaacatatggtgattcatctagtaggtttgattttaatacacataatcactaccatatt 
atatgtgaaaaatgtggtaaaatcgttgacttccattatccacaattagatgaagtagag 
30 caattagctcaacatgtaacagattttgatgttactcatcatcggatggaaatatatgga 
gtatgtaaagaatgtaaagaagaaggaaattga 

Sequence 328, 

VSAELESIDHELEESI ASLRKAGVRITPQRQAIMRYLISSHSHPTADEI YQALSPKFPNI 
35 SVATIYNNLRVFKDIGIVKELTYGDSSSRFDFNTHNH YHI ICEKCGKIVDFHYPQLDEVE 
QLAQHVTDFDVTHHRMEI YGVCKECKEEGN* 

Sequence 329 

Contig_04 67_pos_4 571_4 227, 

40 putative peptide of unknown function 

gtgacaaaccggaggaaggtggggatgacgtcaaatcatcatgccccttatgatttgggc 
tacacacgtgctacaatggacaatacaaagggtagcgaaaccgcgaggtcaagcaaatcc 
cataaagttgttctcagttcggattgtagtctgcaactcgactatatgaagctggaatcg 
ctagtaatcgtagatcagcatgctacggtgaatacgttcccgggtcttgtacacaccgcc 

45 cgtcacaccacgagagtttgtaacacccgaagccggtggagtaaccatttggagctagcc 
gtcgaaggtgggacaaatgattggggtgaagtcgtaacaaggtag 

Sequence 330 

VTNRRKVGMTSNHHAPYDLGYTRATMDNTKGSETARSSKSHKVVLSSDCSLQLDYMKLES 
50 LVIVDQHATVNTFPGLVHTARHTTRVCNTRSRWSNHLELAVEGGTNDWGEVVTR* 

Sequence 331 

Contig_04 68_pos_6704_74 95, 
is similar to (with p-value 2.0e-82) 
55 >sp: sp | 033812 | YLAC__STAXY HYPOTHETICAL TRANSCRIPTIONAL REGULA 
TOR IN LACR 5 1 REGION (FRAGMENT). >gp : gp | Y14 599 | SXLACRPH_1 St 
aphylococcus xylosus lacR, lacP, lacH genes and 2 ORF's. NID 
: g2462702. 

atgaaaacagaaagttttactagagcagcagaaaatttatatacttcgcagccttctgtg 
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agtcgtgatattaaacgtttagaattaaaatataatgttaaaatatttgaatttaaatct 
ccatatttaaaactaactagagatggcgaaaagctattacaatacgcattgcaacgggaa 
agtattgaacaagaattatggcaaaacttaacatcggaatctgaaatcatctcaggcacc 
ttaacaattggaagcagttatacatatggtgaatatttattatcagaacagcttaccagt 
5 cttatgcaacaataccctaagttacatattcatttacgtgttaataattcagattctgtt 
ataaatgatattaaacacaacagagtagatataggtattgtagaaaaggaaattcaagac 
aatgcaataaaatgtaaggaaataatggaagacgaaatggtgtatatttacaaaaaatcg 
attcaacctagaatggatatatgtttcgttagagaaaaagggtctggaacaaggttttat 
caggaagtaggtctttctgagttgaaattaaatccatatttgatagaaattaacaatatt 
10 aagattattaaacaaatggtagaggctggaaatgggtttgcaattatttcaaaatcagca 
cttcatccagaagattatgaaaaattaatgataacaactttaaatgtgaaacgtcactat 
taccttgctcaacatgttgataaatatataggtgaaaatattagagctgtcattgaaatg 
attatgaagtag 

15 Sequence 332 

MKTESFTRAAENLYTSQPSVSRDIKRLELKYNVKIFEFKSPYLKLTRDGEKLLQYALQRE 
SIEQELWQNLTSESEIISGTLTIGSSYTYGEYLLSEQLTSLMQQYPKLHIHLRVNNSDSV 
INDIKHNRVDIGIVEKEIQDNAIKCKEIMEDEMVYI YKKSIQPRMDICFVREKGSGTRFY 
QEVGLSELKLNPYLIEINNIKIIKQMVEAGNGFAIISKSALHPEDYEKLMITTLNVKRHY 

20 YLAQHVDKYIGENIRAVIEMIMK* 



Sequence 333 
25 Contig_04 68j?os_14 619_13816, 

is similar to (with p-value 8.0e-99) 

>gp:gp|U92974 |LLU92974_13 Lactococcus lactis unknown gene, p 
artial cds, and HisC (hisC) , unknown, HisG (hisG) , unknown, 
HisB (hisB), unknown, HisH (hish) , HisA (hisA) , HisF (hisF) , 

30 HisIE (hisIE), unknown, unknown, Leu A (leuA) , LeuB (leuB), 
LeuC (leuC), LeuD (leuD), unknown, IlvD (ilvD), IlvB (ilvB) , 
IlvN, IlvC (ilvC), IlvA (ilvA), AldB (aldB) and aidR (aldR) 
genes, complete cds. NID: g2565137. 
atggattatagagtactactttattataaatatgtaactatagatgaccctgaaactttt 

35 gcagccgaacatttgaaattttgtaaggaacatcatttaaaaggaagaatactagtttca 
• acggaaggcattaatggaacattatctggaacaaaagaagatactgataaatatatagag 
catatgcatgcagatagtcgttttgctgatttaacttttaaaattgatgaagctgaaagt 
catgcgtttaaaaagatgcacgtgcgtccaagacgtgaaattgttgcacttgacttagaa 
gaagatattaatccacgtgaaattaccggtaaatactattctcctaaagaatttaaagcc 

40 gcactagaagatgaaaatactgttatattagatgctcgaaatgattatgaatacgattta 
ggacatttccgtggagctattcgtcctgatataacacgattccgtgacttacctgaatgg 
gtgcgtaataataaagaacaactcgacggaaaaaatattgtcacatattgtacaggtggc 
attcgttgtgaaaaattttctggttggttagtaaaagaaggatttgaaaacgtaggtcag 
ttgcatggtggtattgctacatacggtaaagaccctgaaactaaagggctatattgggat 

45 ggtaagatgtatgtatttgatgaacgtattagtgtcgatgtgaatcaaattgataaaaca 
gtcatcggcaaagagcattttgatggtactaaatactgtcttattctaaacctagtatat 
cagtattttttacaatcgttctaa 

Sequence 334 

50 MDYRVLLYYKYVTI DDPETFAAEHLKFCKEHHLKGRILVSTEGINGTLSGTKEDTDKYIE 
HMHADSRFADLTFKI DEAESHAFKKMHVRPRREIVALDLEEDINPREITGKYYSPKEFKA 
ALEDENTVILDARMDYEYDLGHFRGAIRPDITRFRDLPEWVRNNKEQLDGKNI VTYCTGG 
IRCEKFSGWLVKEGFENVGQLHGGIATYGKDPETKGLYWDGKMYVFDERISVDVNQIDKT 
VIGKEHFDGTKYCLI LNLVYQYFLQSF* 

55 

Sequence 335 

Contig_04 68_pos_4 208_387 6, 

putative peptide of unknown function 

gtgttagcagaggttaagttgtcgtctaaaaacaaaaaagcgatagcgttagccattggt 
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gcactgtcagtagataaatctattcgaaaccaaggtttaggtcaagccctgttaaaagct 
gtagaagaacgtgctaaagaacaaggctattgtgctatttttgtaaataatcatcctcag 
tactttgagaaatctgattatgaagcagcccatttatataatatacatatagaagaaaaa 
cgaaatcatcaatcattattagtaaaatttctaaaaccagttcaaaatgaatggtctgga 
5 atgacggtgtattatccggaagtactggattga 

Sequence 336 

VLAEVKLSSKNKKAIALAIGALSVDKSIRNQGLGQALLKAVEERAKEQGYCAI FVNNHPQ 
YFEKSDYEAAHLYNI HIEEKRNHQSLLVKFLKPVQNEWSGMTVYYPEVLD* 

10 

Sequence 337 

ContigJ)4 68_pos_314 9_2094, 

is similar to (with p-value 1.0e-34) 

>sp : sp | P09122 I DP3X_BACSU DNA POLYMERASE III SUBUNITS GAMMA A 

15 ND TAU (EC 2.7.7.7) . 

atggatcaagcaatagcgtttggagacgaacgacttactttacaagatgctttaaatgtt 
acaggtagtgttgatgaagcggcattaaatgagttatttaatgacattgtaaaaagtgat 
gttaaagccgcatttaatagatatcatcattttatttcagaaggtaaagaagtcaacaga 
ctcattaatgatatgatttactttgttagagatacaattatgaataaaacgtctaacgaa 

20 tccgttcattttgaatcacttattcatttcgacttagatatgttatacaggatgatagat 
atcatcaatgatacactagtatccattaggttcagtgtaaatcaaagtgttcattttgaa 
gtgttgctagttaaacttgcagaaatgattaagacacagcctcaaactgtacaaaatgta 
gcaacagcatcggtagctaatgaaccagataatgagatgttattacaacgtttagaacaa 
cttgaaaatgagcttaaaaccttaaaagaacaagggatcaaaactaataaagttagtcaa 

25 caacctaagaaaccaacacgtacgattcaacgatctaaaaatacgttt tctatgcaacaa 
atagcgaaagtattagacaaagcaaacaaagatgatatcaaattgttgaagaaccattgg 
caagaagtgattgatcatgcaaaaagtaatgataaaaagt ct ttagtaagtttgctactg 
aat tcagaaccagtagcagctagtgaagatcatgtgttagttaaatttgatgaagaaatt 
cattgtgaaatagtaaataaagatgatgaaaagagaaacaatattgaaagtgtagtttgt 

30 aatatagttaataaaactgtcaaagtagttggagtgccggctgaccaatggctgagagtg 
agagcagagtacttacaaaatcgtaacaccaatgaaacacatcaaagcgaaaaacaaagc 
acacaacagtctcaacaaatagatattgctcaaaaagccaaaatattgaagtgtttacga 
cacgaccagatacaatctatggtacttctttcttag 

35 Sequence 338 

MDQAIAFGDERLTLQDALNVTGSVDEAALNELFNDIVKSDVKAAFNRYHHFISEGKEVNR 
LINDMI YFVRDTIMNKTSNESVHFESLIHFDLDMLYRMIDIINDTLVSIRFSVNQSVHFE 
VLLVKLAEMIKTQPQTVQNVATASVANEPDNEMLLQRLEQLENELKTLKEQGIKTNKVSQ 
QPKKPTRTIQRSKNTFSMQQIAKVLDKANKDDIKLLKNHWQEVIDHAKSNDKKSLVSLLL 

40 NSEPVAASEDHVLVKFDEEIHCEIVNKDDEKRNNIESWCNIVNKTVKVVGVPADQWLRV 
RAEYLQNRNTNETHQSEKQSTQQSQQIDIAQKAKILKCLRHDQIQSMVLLS* 

Sequence 339 

Contig_04 68_pos_207 0_4 51, 

45 is similar to (with p-value 0.0e+00) 

>sp:sp|P36430|SYL_BACSU LEUCYL-TRNA SYNTHETASE (EC 6.1.1.4) 
(LEUCINE— TRNA LIGASE) (LEURS) . >pir :pir | A41882 | A4 1882 leuci 
ne — tRNA ligase (EC 6.1.1.4) - Bacillus subtilis >gp:gp|M885 
81|BACLEUS_1 Bacillus subtilis leucyl-transf er RNA synthase 

50 (leuS) gene, complete cds. NID: gl43147. 

gtgaatgaaattacgacaagtgataaagaacaagaagtcaaattgtatcaaaatgaagca 
tcaaaaaaatctgat ttagaacgtacggacttagctaaagaaaaaacaggtgtgtttact 
ggaacatttgcaattaatccgctctctggcgataaattacctatttggatagcagattat 
gttttatcaacttacggtactggtgcagtaatggctgtgcctggacatgatgagcgagat 

55 catgaatttgctacgaagtttaatttaccaattatcgaagttatagagggtggcgaagtt 
caaaaatatgcatacacaggtgaaggaaaacacattaattctggagaattagacggtcta 
gaaaatgaagcggcaataagtaaagcgatagaattgcttgaatctaaaggtgctggtgag 
aaaaaagtcaattataaattacgtgattggttatttagtaggcaacgttattggggagag 
ccaattcctattatacattgggaagatggatcaatgactacagttcctgaagatgaattg 
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cctttactacttcctgaaacagatgaaattaagccatcaggtaccggtgaatctccactt 
gcaaatatagatgcgttcgtaaacgttatcgatgaaaagacaggtatgaaggggcgccga 
gaaaccaatacaatgcctcaatgggctggcagttgctggtactatttacgttacattgat 
ccacataacgaaaaaatgatagcagatcctgaaaaattaaagcattggctacctgttgat 
5 ttatatattggaggcgtggaacatgcagtacttcacttattatatgcaagattctggcat 
aaagtgttatatgacttaggtgttgtaccaacaaaagaaccattccaaaaactatacaat 
cagggaatgattttaggcgaaggcaatgaaaaaatgagtaagtctaaaggtaatgtgatt 
aatccagatgatattgttgcatcacatggtgctgatacattacgactatatgaaatgttt 
atgggacctttagatgctgcgatcgcatggagtgaaaaaggtttagatggttctagaaga 

10 ttcttagatcgtgtttggagacttatcattactgatgaaaattcaatcaataaaaaaatt 
gtagattctaacaatcattcacttgataaggtttacaatcaaactgtgaaaaaagtaaca 
gaagattttgatacacttagttttaatactgcaatcagtcaattaatggtgtttattaat 
gagtgttataaaactaatgaagtttacaaaccttatatcgaagggtttgtaaaaatgtta 
tcgcctattgcaccacacattggtgaagaattatgggatcgattagggcatgaaaatacc 

15 attacttatcaaccatggccaacatttgatgaaagtttattagtagatgatgaagttgaa 
atcgtagttcaagtcaatggtaaagttagagcaaaaatcaatattccaaaagatttatct 
aaagaagaaatgcaagacttagccttgtctaatgataatgttaaaatgagtattgaagga 
aaagaagttaaaaaagttattgctgtacctcaaaagctagttaatatagttgctaaataa 

20 

Sequence 340 

VNEITTSDKEQEVKLYQNEASKKSDLERTDLAKEKTGVFTGTFAINPLSGDKLPIWIADY 
VLSTYGTGAVMAVPGHDERDHEFATKFNLPIIEVIEGGEVQKYAYTGEGKHINSGELDGL 
ENEAAISKAIELLESKGAGEKKVNYKLRDWLFSRQRYWGEPIPIIHWEDGSMTTVPEDEL 

25 PLLLPETDEIKPSGTGESPLANI DAFVNVIDEKTGMKGRRETNTMPQWAGSCWYYLRYID 
PHNEKMIADPEKLKHWLPVDLYIGGVEHAVLHLLYARFWHKVLYDLGVVPTKEPFQKLYN 
QGMILGEGNEKMSKSKGNVINPDDIVASHGADTLRLYEMFMGPLD7\AIAWSEKGLDGSRR 
FLDRVWRLI ITDENSINKKI VDSNNHSLDECVYNQTVKKVTEDFDTLS FNTAI SQLMVFIN 
ECYKTNEVYKPYIEGFVKMLSPIAPHIGEELWDRLGHENTITYQPWPTFDESLLVDDEVE 

30 IVVQVNGKVRAKINIPKDLSKEEMQDLALSNDNVKMSIEGKEVKECVIAVPQKLVNIVAK* 



Sequence 341 

Cont ig_04 6 9_pos_l 3 4 6_4 051, 
35 is similar to (with p-value 0.0e+00) 

>sp: spj P09339[ACON_BACSU ACONITATE HYDRATASE (EC 4.2.1.3) (C 
ITRATE HYDRO-LYASE) (ACONITASE) . 

atggcttctaatattaaagaacaagcaaagaaacaat tcgaattaaatggccaatcatat 
acttactatgacttacaaacattagaagaaaaagggctagctaaaatttctaaattacca 
% 40 tactcaattcgcgtattgttagaatctgtgttacgacaagaggatgattttgttataaca 
gatgatcatatcaaagcattaagtaaattcggaaatgcaggtaacgaaggtgaagttcca 
ttcaaaccttctagagttattttacaagactttacaggtgtgccagcagtagtagatttg 
gcttctttacgtaaagctatgaatgatgttggtggagatattaataaaatcaacccagaa 
gtacctgtggatttagttatcgaccattcagttcaagttgatagttacgctaatccagaa 

45 gcattagaacgtaatatgaaattagaatttgaacgtaactatgaacgttatcaatttt ta 
aactgggcaacaaaagcttttgataactataatgcagtacctcctgctacaggtattgtc 
catcaagtaaacttagagtatttagcaaatgtagtacatgtaagagatgttgatggtgaa 
aaaacagcatttcctgacactttagtaggtactgattcacatactacaatgattaatggt 
attggtgttctaggttggggcgttggtggtatcgaagccgaagcaggtatgttaggacaa 

50 ccatcatatttcccaattcctgaagttatcggagtgcgtttaactcactctttaccacaa 
ggctcaacagctacggatttagctttacgtgtgactgaagaattacgtaaaaaaggtgta 
gttggaaaatttgttgaattcttcggtccaggtgttcaacatttaccattagcagacaga 
gctacaattgctaacatggctccagaatatggtgcaacgtgtggtt tcttcccagtagat 
gaagaatcattgaaatatatgaaacttacaggccgtgacgaagaacatattgaattggtt 

55 aaagaatatttacaacaaaaccatatgttctttgatgtagaaaaagaggatcctgaatat 
acagatgttattgatttagacttatctacagtagaggcatcactttctggtccaaagcgt 
ccacaagacttaattttcttaagtgatatgaaaaaagaatttgaaaaatcagtaactgct 
cctgctggtaatcaaggacatggacttgatcaaagtgaatttgataaaaaagcagaaatt 
aattttaatgatggatctaaagcaacaatgaaaacaggagatatagcaattgctgctatt 
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acctcatgtactaacacttctaatccatatgttatgttaggtgctggtttagttgctaaa 
aaagctgtagaaaaaggattgaaagtaccagagtttgttaagacgtcacttgctccaggt 
tcaaaagttgttacaggatatttaagagattctggattacaacagtatttagatgattta 
ggtttcaatcttgttggttatggttgtactacatgtattggtaactcagggccactatta 
5 cctgaaattgaaaaggcagttgcggatgaagatttattagtaacttcagttttatcaggt 
aatcgtaattttgaggggcgaatccatccattagtgaaagcaaact at ttagcct caeca 
caacttgttgtagcttatgcgcttgctggtacagtagatattgatttacaaaatgaacca 
attggtaaaggtaaagatggtaaagatgtatatttacaagacatttggccttcaatacaa 
gaagtttctgatactgtagataaagttgttacacctgaactattcttagaagaatataaa 

10 aatgtatatcataacaatgaaatgtggaatgaaatagatgtaaccgatgaaccattatat 
gatttcgatcctaattcaacatatattcaaaatccaacatttttccaaggattatctaaa 
gagccgggtaaaattgaaccacttaaaagtttgagagttatgggtaaatttggtgattct 
gttacaacagaccatatttctccagcaggtgctatcggtaaagatacaccagcaggaaaa 
tacttattagatcatgatgttgcaattcgcaactttaactcttatggttcccgtcgcggt 

15 aaccacgaagttatggtacgtggtacatttgccaatattcgtatcaaaaaccaacttgct 
ccaggtactgaaggcggatttacaacatattggcctaccggagaaataatgcctatatat 
gatgcagcaatgaaatataaagaagatggaactggcttagttgtcttagctggtaatgac 
tatggaatgggatcttctcgtgactgggctgcaaaaggtaccaatttattaggagttaaa 
actgtcattgcacaaagctatgaacgtattcatcgctctaacttagttatgatgggtgta 

20 ctaccgcttcaattccaacaaggagaatctgcagaagcactgggtcttgatggaaaagaa 
gaaatatctgtagatattaatgaagatgtacagccacatgatcttgtaaatgtgactgca 
aaaaaagaaaatggtgaaatcattaatttcaaagctattgtacgttttgattcactagta 
gaattagattattatcgtcatggtggtattttacaaatggtactaagaaataaacttgcg 
cagtaa 

25 

Sequence 342 

MASNIKEQAKKQFELNGQSYTYYDLQTLEEKGLAKISKLPYSIRVLLESVLRQEDDFVIT 
DDHIKALSKFGNAGNEGEVPFKPSRVILQDFTGVPAVVDLASLRKAMNDVGGDINKINPE 
VPVDLVIDHSVQVDSYANPEALERNMKLEFERNYERYQFLNWATKAFDNYNAVPPATGIV 

30 HQVNLEYLANVVHVRDVDGEKTAFPDTLVGTDSHTTMINGIGVLGWGVGGIEAEAGMLGQ 
PSYFPIPEVIGVRLTHSLPQGSTATDLALRVTEELRKKGVVGKFVEFFGPGVQHLPLADR 
ATIANMAPEYGATCGFFPVDEESLKYMKLTGRDEEHIELVKEYLQQNHMFFDVEKEDPEY 
TDVI DLDLSTVEASLSGPKRPQDLI FLSDMKKEFEKSVTAPAGNQGIIGLDQSEFDKKAEI 
NFNDGSKATMKTGDIAIAAITSCTNTSNPYVMLGAGLVAKKAVEKGLKVPEFVKTSLAPG 

35 SKVVTGYLRDSGLQQYLDDLGFNLVGYGCTTCIGNSGPLLPEIEECAVADEDLLVTSVLSG 
NRNFEGRIHPLVKANYLASPQLVVAYALAGTVDIDLQNEPIGKGKDGKDVYLQDIWPSIQ 
EVSDTVDKVVTPELFLEEYKNVYHNNEMWNEIDVTDEPLYDFDPNSTYIQNPTFFQGLSK 
EPGKIEPLKSLRVMGKFGDSVTTDHISPAGAIGKDTPAGKYLLDHDVAIRNFNSYGSRRG 
NHEVMVRGTFANIRIKNQLAPGTEGGFTTYWPTGEIMPI YDAAMKYKEDGTGLVVLAGND 

40 YGMGSSRDWAAKGTNLLGVKTVIAQSYERIHRSNLVMMGVLPLQFQQGESAEALGLDGKE 
EISVDINEDVQPHDLVNVTAKKENGEIINFKAIVRFDSLVELDYYRHGGILQMVLRNKLA 
Q+ 

Sequence 34 3 
45 Contig_04 69_pos_4 174_4 64 1 , 

putative peptide of unknown function 

atgatatacagtttgactgaaattgaagcaagatatcaagaaaccgataaaatgggggtt 
atctatcacggtaactacgcaacatggtttgaagttgcgagaacagactatataagaaag 
cttggcttcagttatgcctctatggaagaacaaggtgttatttcaccagttgtagattta 
50 aaagtgcaatataaaaaatcaatttactatcctgaaaaggtgacagtaaaaacatgggtg 
gaaaaatattctagattacgttcaacttattgttatgaggtttataatgaaaatggagag 
ttagctactactggttcaacagaacttatctgtattaaagcagatacatttaaacccata 
cgcttggatagatattttcctgagtggcatgagacttatagtaaagttaaccagttaaat 
aaagaaggtaaagatgctgaggttacgtttggcattaatcatttataa 

55 

Sequence 34 4 

MI YSLTEIEARYQETDKMGVI YHGNYATWFEVARTDYIRKLGFSYASMEEQGVISPVVDL 
KVQYKKSI YYPEKVTVKTWVEKYSRLRSTYCYEVYNENGELATTGSTELICIKADTFKPI 
RLDRYFPEWHETYSKVNQLNKEGKDAEVTFGINHL* 
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Sequence 345 

Contig_04 69_pos_6050_804 4 , 
is similar to (with p-value O.Oe+00) 
5 >sp:sp|P50072|PARE_STAAU TOPOISOMERASE IV SUBUNIT B (EC 5.99 
.1.-). >gp:gp| D67075 I D67075_l Staphylococcus aureus DNA for 
DNA topoisomerase IV GrlB subunit, DNA topoisomerase IV GrlA 

subunit, complete cds . NID: gl777319. >gp: gpl L25288 I STAGYRA 
SL_1 Staphylococcus aureus gyrase-like protein alpha and bet 
10 a subunit (grlA and grlB) genes, complete cds. NID r g561878. 

>gp:gp|A48501 IA485011 Sequence 3 from Patent WO9603516. NI 
D: g2302280. 

atgaataaacaaaataattattcagatgattcaattcaggtacttgaaggactagaagca 
gt taggaagagacctggtatgtacattggatcaactgataaacgaggattacatcatctt 

15 gtatatgaagttgtcgataactccgtcgatgaagtattaaatggttatggtgatgcgatt 
acagtaacaattaatcaggatggtagtatttctatagaagataatggtcgaggtatgcca 
acaggtatacatgcgtctggcaaacctactgcagaagttatatttactgttttacatgct 
ggaggtaaatttggacaaggaggttataaaacatctggaggtctccatggggtgggtgct 
tctgtagtaaatgcccttagtgaatggct tgaagttgaaat teat agagatggtaat ate 

20 tacacacaaaatttcaaaaatggtggtattccagcgacaggtttagtaaaaactggaaaa 
acaaaaaaaactggtactaaagttacatttaaaccagactcagaaatatttaagtcaacg 
acgacttttaattt tgatattttaagtgagcgtttacaagaatctgcatttt tacttaaa 
gatttaaaaattacacttactgatttacgtagtggaaaagaacgagaagaaatttaccat 
tacgaagaaggaattaaagaatttgttagttatgtcaatgaaggtaaagaagtattacat 

25 gatgttactacatttgcagggcattccaatggaatagaggtagacgtagcattccaatat 
aatgttcagtactctgagagcatattaagttttgtaaataatgttcgtacaaaggacgga 
ggtactcatgaagttggtttcaaaacggcgatgactcgtgtttttaatgaatatgcacgt 
cgtataaacgaactgaaagataaagataaaaatttagacggtaatgatatacgcgaaggt 
ttaacagcgataatttcagtacgtataccagaagaacttcttcaatttgaagggcaaacg 

30 aaatcaaaacttggcacttcagaagcaaggagtgctgtagactctgttgtttcagaaaaa 
ttaccatattacttagaagaaaagggccaattatctaaatcattagttaaaaaagcaatt 
aaagctcaacaagcacgcgaggctgctcgtaaagctagagaagatgcacgctccggaaag 
aaaaataaacgtaaagatacattgttatcaggtaagttaactcctgcgcaaagtaaaaat 
actgataaaaacgagttatatctagttgagggtgattcagcgggaggttctgcaaaattg 

35 ggacgcgaccgtaaattccaagctattttacctcttcgtggaaaggttattaatacagaa 
aaggcacgtttagaggatatttttaaaaatgaagaaattaatacgattattcatactatt 
ggtgctggtgttggtactgactttaaaattgaggatagtaattacaacagaattattatc 
atgacagatgctgatacggatggtgcacatattcaagtattattgcttacatttttcttt 
aaatatatgaaaccacttgttcaagctggacgtgtctttattgcgttaccgcctttatac 

40 aaattagaaaaaggcaaaggtaagaataaaaaagttgagtacgcttggactgatgaagaa 
ttagaaaatttacaaaagcaattaggaaaaggtttcatattacagcgttataaaggtctt 
ggtgaaatgaatccagaacaattatgggaaactaccatgaatccagaaactcggacatta 
attagagttcaagttgaagatgaagttcgttcatcaaaacgtgtcactactttgatgggg 
gataaggttgccccacgaagagagtggattgaaaaacacgttgaatttggtatgcaagaa 

45 gatcaaagcattttggataataaagaagtccaaatactagagaatgaaaaatatattgag 
gaggaaacgaattga 

Sequence 34 6 

MNKQNNYSDDSIQVLEGLEAVRKRPGMYIGSTDKRGLHHLVYEVVDNSVDEVLNGYGDAI 
50 TVTINQDGSISIEDNGRGMPTGIHASGKPTAEVIFTVLHAGGKFGQGGYKTSGGLHGVGA 
SVVNALSEWLEVEIHRDGNI YTQNFKNGGIPATGLVKTGKTKKTGTKVTFKPDSEIFKST 
TTFNFDILSERLQESAFLLKDLKITLTDLRSGKEREEI YHYEEGIKEFVSYVNEGKEVLH 
DVTTFAGHSNGIEVDVAFQYNVQYSESILSFVNNVRTKDGGTHEVGFKTAMTRVFNEYAR 
RINELKDKDKNLDGNDIREGLTAI ISVRIPEELLQFEGQTKSKLGTSEARSAVDSVVSEK 
55 LPYYLEEKGQLSKSLVKKAIKAQQAREAARE^AREDARSGKKNKRKDTLLSGKLTPAQSKN 
TDKNELYLVEGDSAGGSAKLGRDRKFQAILPLRGKVINTEKARLEDIFKNEEINTIIHTI 
GAGVGTDFKIEDSNYNRII IMTDADTDGAHIQVLLLTFFFKYMKPLVQAGRVFIALPPLY 
KLEKGKGBnSIKKVEYAWTDEELENLQKQLGKGFILQRYKGLGEMNPEQLWETTMNPETRTL 
IRVQVEDEVRSSKRVTTLMGDKVAPRREWI EKHVEFGMQEDQSILDNKEVQILENEKYIE 
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EETN* 

Sequence 347 

Contig_04 69_pos_8185_104 4 3, 
5 is similar to (with p-value 0.0e+00) 

>sp:sp|P50073|PARC_STAAU TOPOISOMERASE IV SUBUNIT A (EC 5.99 
.1,-). >gp:gp| D67075 I D67075_2 Staphylococcus aureus DNA for 
DNA topoisomerase IV GrlB subunit, DNA topoisomerase IV GrlA 
subunit, complete cds. NID: gl777319. 

10 atgtattcaagtgggaatacgtatgataaaaatttccgtaaaagtgcgaaaactgtcggt 
gatgtaataggtcaatatcatcctcatggagactcttcagtatatgatgctatggtgcgc 
ttaagtcaagattggaagttacgtcatgttctaattgaaatgcatggtaataatggtagt 
atcgataacgatcctccagctgctatgcgttacacagaagctaaacttagtcaattatca 
gaagaactattaagggatattaataaggaaacagtatcatttattccaaactatgatgac 

15 acaactttggaaccaatggtattaccagcgagatttcctaatttattaattaatggatct 
acggggatttcttcaggatatgctactgatatcccgccgcataacctcgccgaagtaata 
caaggcacattgaagtatatcgatcaacctgatattacaattaatcaactgatgaaatat 
atcaaagggcctgactttcctacaggtggtatcattcaaggaatagaaggtataaaaaaa 
gcgtatgagaccggtaaaggaaaggttgtcgtgcgttcacgagtagatgaagagccttta 

20 agaagtggacgtaaacaattaattgtgactgaaattccgtatgaagtgaataaaagtagt 
ttagttaaaagaattgacgaattacgtgccgataaaaaggttgatggtattgtagaagtt 
cgagatgagactgatagaactggattacgaattgcaatcgaattaaaaaaagatgctaat 
agcgaatcaatcaaaaactatttatataagaattcggatttacaaatttcatataatttt 
aatatggttgctattagtgaaggtcgccctaagttgatgggattacgtgaaattatagaa 

25 agttatttaaatcatcaaattgaagtggttacaaatagaacgcgttatgacttagagcaa 
gctgaaaaacgtatgcatattgtggaaggattaatgaaagctttatctatacttgatgaa 
gttattgcattgatacgtaattctaaaaataaaaaagatgctaaagataatttagttgca 
gagtatgactttactgaagctcaagcagaagctattgtcatgttacagctgtatagatta 
acaaatactgacattgaagctttgaaaaaagaacatgaagagttagaagctttaataaaa 

30 gaattaagaaatatct tagataatcatgaggcacttttagcagtaattaaagatgaacta 
aatgaaattaaaaagaaatttaaagtggatcgactatctacaatcgaagctgaaatttcc 
gaaatcaaaattgataaagaagttatggtgcctagtgaagaagtgattttaagtttgacg 
caacatggctatataaaacgtacatctacacgtagttttaacgcaagtggtgtgactgaa 
atcggtttgaaggacggcgaccgtttattaaaacatgaaagcgtgaatactcaagatact 

35 gttcttgtatttacaaataaaggtagatatttgtttatacctgttcataaattagccgat 
atccgttggaaagagcttggtcaacacatatcacaaattgtgccaatagatgaagatgaa 
gaagtggtaaatgtatacaacgaaaaagattttaaaaatgaagccttttatattatggct 
acaaaaaacggcatgattaagaaaagtagtgcttcacaatttaaaactactcggtttaat 
aaaccactcataaatatgaaggttaaagacaaagatgaacttattaatgtcgttcgatta 

40 gagtctgatcagttaattactgttctaacccataaaggcatgtcattaacttattcaact 
aatgaattatcggatacaggct taagagcagctggtgt taaatcaattaatcttaaagat 
gaagactatgttgttatgacagaagatgtgaacgactcagattccataataatggttaca 
caacgtggtgctatgaagcgtattgattttaatgttcttcaagaagctaaacgcgcacaa 
cgtggaattactttactaaaagaattaaagaaaaaaccgcatcgaattgtggcaggtgca 

45 gtagttaaagaaaatcacacgaaatatattgtattctctcaacatcatgaagaatatggt 
aatatcgatgatgtacacttatctgaacaatatactaatggatcatttattattgatact 
gatgattttggagaagtagaaagtatgattctagagtaa 

Seguence 348 

50 MYSSGNTYDKNFRKSAKTVGDVIGQYHPHGDSSVYDAMVRLSQDWKLRHVLIEMHGNNGS 
IDNDPPAAMRYTEAKLSQLSEELLRDINKETVSFI PNYDDTTLEPMVLPARFPNLLINGS 
TGISSGYATDIPPHNLAEVIQGTLKYIDQPDITINQLMKYIKGPDFPTGGIIQGIEGIKK 
AYETGKGKWVRSRVDEEPLRSGRKQLIVTEIPYEVNKSSLVKRIDELRADKKVDGI VEV 
RDETDRTGLRI AIELKKDANSESIKNYLYKNSDLQISYNFNMVAISEGRPKLMGLREI IE 

55 SYLNHQIEWTNRTRYDLEQAEKRMHIVEGLMKALSILDEVIALIRNSKNKKDAKDNLVA 
EYDFTEAQAEAIVMLQLYRLTNTDIEALKKEHEELEALIKELRNILDNHEALLAVIKDEL 
NEIKKKFKVDRLSTIEAEISEIKIDKEVMVPSEEVILSLTQHGYIKRTSTRSFNASGVTE 
IGLKDGDRLLKHESVNTQDTVLVFTNKGRYLFIPVHKLADIRWKELGQHISQI VPIDEDE 
EVVNVYNEKDFKNEAFYIMATKNGMIKKSSASQFKTTRFNKPLINMKVKDKDELINVVRL 
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ESDQLITVLTHKGMSLTYSTNELSDTGLRAAGVKSINLKDEDYVVMTEDVNDSDSIIMVT 
QRGAMKRIDFNVLQEAKRAQRGITLLKELKKKPHRIVAGAVVKENHTKYIVFSQHHEEYG 
NIDDVHLSEQYTNGS FI I DTDDFGEVESMILE* 

5 Sequence 34 9 

Contig_04 69jpos_10802_12193, 

is similar to (with p-value 0.0e+00) 

>sp:sp|Q45068|ALST_BACSU AMINO ACID CARRIER PROTEIN ALST . >g 
p:gp| Z73234 |BC170DEGR_21 B.subtilis DNA (26.2 kb fragment; 1 
10 70 degree region). NID: gl405443. >gp : gp | Z99113 I BSUB0010_105 
Bacillus subtilis complete genome (section 10 of 21) : from 
1781201 to 2014980. NID: g2634090. 

atgttaccagagatgtttagagcattaactgaaaagccagaaactttaagtagtggtgag 
aagggt at tt caeca tttcaagcttttgcgattagtgctgggtcaagagtaggaactgga 

15 aatattgccggtgttgcaactgctattgttcttggtggccccggtgcagtcttctggatg 
tggattattgcttttattggtgcagctagtgcatttatggaagcaacgcttgctcaagtt 
tataaggtacatgacaaagaaggtggattccgtggcggaccagcctattacataacaaaa 
gggctaaaccaaaaatggcttggaattgtatttgctgttttaattacagttacatttgct 
tttgtatttaatactgttcaagcgaatacaattgctgaatcattaaatacacaatacaat 

20 attagcccggtaattactggaatagtacttgcagttattacaggtattatcatctttggt 
ggtgttcgtagcatagctacactatcttcacttattgtgcctattatggctattgtttat 
ataggtatggttttaatcattttattactcaatatagatcaaattgtacctatgattggc 
actattattaaaagtgcattcggagttcagcaggttactggtggtgctgtaggagctgct 
attcttcaaggtattaaacgtggtttattctcaaacgaagctggtatgggatctgcacct 

25 aatgctgctgctacatctgctgtgccccatcccgttaaacaaggtttaattcaatcatta 
ggtgtattctttgacactatgcttgtttgtacagctacagcaattatgattttattatat 
tctggtttgcaatttggtgatagcgcgcctcaaggtgtagcagttacgcaatcagcgttg 
aacgaacatttaggttcagcaggaggtattttcttaactgtagcagttaccttatttgca 
ttttcatctgttgtaggtaactattactatggacaatccaatattgaatttttatctaac 

30 aataagatgatattatttatttttagatgttttgtagtacttttagtatttgtaggtgct 
gttgctaaaacagaaacagt ttggagtactgccgatttatttatgggtcttatggcaata 
gtaaatatcatatcaattataggtttgtcgaatattgcgtttgcagtgatgaaagattat 
caaagacaggagataatcataaaaattagatactacagtcatagtcagattaattgtctt 
aggactaatgcttttttcttttcggtgataagagaggacattctcatattcaaaaggtgt 

35 tttcatccatcttataaaatcaacaaagttatcaaaactaacttctttatagcatatctt 
tttactttctaa 

Sequence 350 

MLPEMFRALTEKPETLSSGEKGISPFQAFAISAGSRVGTGNIAGVATAIVLGGPGAVFWM 
40 WIIAFIGAASAFMEATLAQVYKVHDKEGGFRGGPAYYITKGLNQKWLGIVFAVLITVTFA 
FVFNTVQANTIAESLNTQYNISPVITGIVLAVITGIIIFGGVRSIATLSSLIVPIMAIVY 
IGMVLIILLLNIDQIVPMIGTIIKSAFGVQQVTGGAVGAAILQGIKRGLFSNEAGMGSAP 
NAAATSAVPHPVKQGLIQSLGVFFDTMLVCTATAIMILLYSGLQFGDSAPQGVAVTQSAL 
NEHLGSAGGIFLTVAVTLFAFSSWGNYYYGQSNIEFLSNNKMILFIFRCFVVLLVFVGA 
45 VAKTETVWSTADLFMGLMAIVNIISIIGLSNIAFAVMKDYQRQEIIIKIRYYSHSQINCL 
RTNAFFFSVIREDILIFKRCFHPSYKINKVIKTNFFIAYLFTF* 

Sequence 351 

Contig_0469_pos_12403_11954, 

50 putative peptide of unknown function 

gtgtatacttatactattaaaattagaaatgagattattatgaaaatagtagaagtaaaa 
tctaagaatggtaccaattttatgattttagatggtaataatgaacctatagtagatgca 
gtaagatatttgaagtatctggatagtgttaagaaaagtttaaataccaagaaaacctat 
gcctatgcactaaaaaatttttttgtttacttagaaagtaaaaagatatgctataaagaa 

55 gttagttttgataactttgttgattttataagatggatgaaaacaccttttgaatatgag 
aatgtcctctcttatcaccgaaaagaaaaaagcattagtcctaagacaattaatctgact 
atgactgtagtatctaatttttatgattatctcctgtctttgataatctttcatcactgc 
aaacgcaatattcgacaaacctataattga 
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Sequence 352 

VYTYTIKIRNEIIMKIVEVKSKNGTNFMILDGNNEPIVDAVRYLKYLDSVKKSLNTKKTY 
AYALKNFFVYLESKKICYKEVSFDNFVDFIRWMKTPFEYENVLSYHRKEKSISPKTINLT 
MTVVSNFYDYLLSLIIFHHCKRNIRQTYN* 



Sequence 353 

Cont ig_0 4 7 0_po s_4 2 3 24 603, 

10 is similar to (with p-value 1.0e-23) 

>gp:gp j D85752 | D85752_8 Enterococcus faecalis plasmid pPDl ba 
cA, bacB, bacC f bacD, bacE, bacF, bacG, bacH and bad genes, 

complete cds . NID: g2879906. 
atgatagtaattactgatgcaataccaataattattccaatcatcgtaaagatattacgt 

15 cgtttgttttttaaaatggaacgaatagcaactgatatgacatttgaaaagttattcacg 
tgtcaacacctcttcctcttgcacacgcccatccaaaatatggataatacgatcagcttt 
ttccgccacttcacgatcatgcgttaccataataatagttgtattctgttctttgttcag 
ttttacgaaaagctccataatatcttgagatgtcttcgaatcaagagcgccagtaggttc 
atcggcaataataaacttagggtcattaataattgcccgggcaatagctacacgttgttg 

20 ctgccctcctga 

Sequence 354 

MIVITDAIPII IPIIVKILRRLFFKMERIATDMTFEKLFTCQHLFLLHTPIQNMDNTISF 
FRHFTIMRYHNNSCI LFFVQFYEKLHNILRCLRIKSASRFIGNNKLRVINNCPGNSYTLL 
25 LPS* 

Sequence 355 

Contig__0470_pos_6112_7041, 

putative peptide of unknown function 

30 gtggtttattttgttgagttaaatatattagaatgtgagggagtatttgaattgaagaaa 
ttagcagtgatatgtgcgtttacaatattaatattagctggttgtggtcttggtgatagt 
gataataatggaagctcaacgataaatgatgatcaacaatcaggatataaaagtaacaga 
gattcaaaatcaagtataagtagaaatcaaacagaagataatcagcaggacacacaacaa 
gatacccattcgaatagatactatgctcaagtttggttaactgctttagatagttataga 

35 ggtgaaagtgaccttccttttgacgatttagaaattgtacatcaaaatatttctaataaa 
gttttagatccctatcacccagacgaatcagccaaactacctgaaggaacagaattgtta 
acagcaagtgttactgcagcaggttcagtttattataaaagtaatggagatggcacaatt 
acaatatatagtgtaccatcacatttccaagggagttggcgtgacgctgattactctaaa 
agagaatctcaacgcattatagatgatgctcgtacagttaagttatacaacgctagtgaa 

40 agtgaaatcaataagataagtcagatgatgaggactgaattttcagttggtgataattta 
acagatgaagatgatacttctgaatctgaagatcaatcaagtagttctgatgaagcaacg 
gtgacacgaagtaatgttatcgatatagttgaagactacgaagggcatcaattagataca 
gacacatatatttacaaagaaccagaaaaagatagcgatggtagttgggggttctcattt 
acagataaagaaggccatttagaaggatcttatattatcgataaagatggagaagtaacg 

45 aagtatgatgaagatggagagccagaataa 

Sequence 356 

VVYFVELNILECEGVFELKKLAVICAFTILILAGCGLGDSDNNGSSTINDDQQSGYKSNR 
DSKSSISRNQTEDNQQDTQQDTHSNRYYAQVWLTALDSYRGESDLPFDDLEIVHQNISNK 
50 VLDPYHPDESAKLPEGTELLTASVTAAGSVYYKSNGDGTITIYSVPSHFQGSWRDADYSK 
RESQRI IDDARTVKLYNASESEINKISQMMRTEFSVGDNLTDEDDTSESEDQSSSSDEAT 
VTRSNVIDIVEDYEGHQLDTDTYI YKEPEKDSDGSWGFSFTDKEGHLEGSYIIDKDGEVT 
KYDEDGEPE* 

55 Sequence 357 

Contig_0470j?os_10172_10480, 
putative peptide of unknown function 

atgatggccaacacctttaatataaccaatacattttccatacgagcggcttcgttcatt 
ccgcgtgataatagtaatgcagttaaaataatcactacagcagcaatgatatcaatgaca 
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ccaccgttacttccaaatggattagataatgatttaggtaaagaaatgcccaatggtgca 
ataagacctcttaagttagcagaaaagcctgaagcaacgaaagcaacagcaataaagtat 
tctgctaaaagcgcccaaccggcaacccatccgaataattcaccaaaaagaacattaatc 
catgaataa 

5 

Sequence 358 

MMANTFNITNTFSIRAASFIPRDNSNAVKIITTAAMISMTPPLLPNGLDNDLGKEMPNGA 
I RPLKLAEKPEATKATAI KYSAKSAQPATHPNNS PKRTLIHE * 

10 Sequence 359 

Contig_04 70_pos_11394_12080, 

is similar to (with p-value 8.0e-79) 

>gp:gp|D78193|BACGNTZA_30 Bacillus subtilis 36kb sequence be 
tween gntZ and trnY genes encoding 34 ORFs . NID: gl0647.80. > 

15 gp:gp| Z99124 | BSUB0021_142 Bacillus subtilis complete genome 
(section 21 of 21): from 3999281 to 4214814. NID: g2636442. 
atggaagaacttttcagccaaatcgacagaaacattaaggatttaaacggaattttagtg 
acacatgaacacatcgaccatattaaaggtcttggtgttttagcacgtaaatataaactt 
ccgatttacgcgaatgagaatacatggaaagcgatagagaagaaagatagccgcattcca 

20 atggatcagaaatttatctttaatccatatgaaacgaaatctcttgcaggatttgatata 
gaatcatttaacgtgtcacatgacgcgattgatccacaattctacatcttccacaataac 
tataagaaatttacgatgataactgacactggttacgtttcagatcgtatgaaaggtatg 
attcaaggtagtgatgtctttatgtttgaaagtaatcacgatgtcgatatgttacgcatg 
tgtcgctatccatggaagacgaaacaacgtattttaagtgatatgggtcacgtatccaat 

25 gaagacgcgggtcttgcgatgagtgatgtcattacaggtaatacgaaacgtatatacctc 
tctcatttgtcacaagacaataatatgaaagacctcgcacgcatgagtgttggacaagtg 
ctcaacgaacacgatatcgatacagagaaagaagtattgctttgcgataccgataaagca 
caagccacaccgatttatacactataa 

30 Sequence 360 

MEELFSQIDRNIKDLNGILVTHEHIDHIKGLGVLARKYKLPI YANENTWKAIEKKDSRIP 
MDQKFI FNPYETKSLAGFDIESFNVSHDAIDPQFYIFHNNYKKFTMITDTGYVSDRMKGM 
IQGS DV FMFESNH DVDMLRMCR Y PWKTKQRI LS DMGHVSNE DAGLAMS DVI TGNTKRI YL 
SHLSQDNNMKDLARMSVGQVLNEHDIDTEKEVLLCDTDKAQATPI YTL* 

35 

Sequence 361 

Contig_04 70_pos_13070_13600, 
. is similar to (with p-value l.Oe-48) 
>gp:gp|D78193|BACGNTZA_15 Bacillus subtilis 36kb sequence be 

40 tween gntZ and trnY genes encoding 34 ORFs. NID: gl064780. > 
gp: gp I Z99I24 I BSUB0021_128 Bacillus subtilis complete genome 
(section 21 of 21): from 3999281 to 4214814. NID: g2636442. 
gtgtataacttgtggataactcgaataaatcatgtacttttggaggataaaatgaagatt 
actatcttatcagttggaaaactaaaagaaaaatattggaagcaagccattgcagaatat 

45 gaaaaaagattaggaccttacacgaaaatcgaattaatagaagtaccagatgaaaaagca 
cctgaaaatatgagcgacaaagaaatagaacaagttaaagaaaaagaaggccaacgccta 
ctcaataagattaactcccaatctacagtaatcacgttggaaatcaaaggcaaaatggtg 
tcttcagaaggactcgctaaagaactgcaaacacgcatgacacaaggtcaaagcgacttt 
acatttgtcataggtggctccaatggtttacaccaagacgtcttacaacgcagcaactac 

50 gcactatcattcagcaacatgaccttcccacatcaaatgatgcgtgtaatattgattgaa 
caaatttatcgcgcattcaaaatcatgagaggtgaagcgtatcataaatga 

Sequence 362 

VYNLWITRINHVLLEDKMKITILSVGKLKEKYWKQAIAEYEKRLGPYTKIELIEVPDEKA 
55 PENMSDKEIEQVKEKEGQRLLNKINSQSTVITLEIKGKMVSSEGLAKELQTRMTQGQSDF 
TFVIGGSNGLHQDVLQRSNYALSFSNMTFPHQMMRVILIEQIY.RAFKIMRGEAYHK* 

Sequence 363 

Contig_04 70_pos_1554 1^14 693, 



94 



WO Oi/34809 



PCT7USOO/30782 



is similar to (with p-value 3.0e-75) 

>sp:sp| P04188 |STSP_STAAU GLUTAMYL ENDOPEPTIDASE PRECURSOR (E 
C 3.4,21.19) (STAPHYLOCOCCAL SERINE PROTEINASE) (V8 PROTEINA 
SE) (ENDOPROTEINASE GLU-C) . >pir : pir I A26812 | PRSASK glutamyl 
5 endopeptidase (EC 3.4.21.19) precursor - Staphylococcus aure 
us >gp: gp I Y00356 I SASP_1 Staphylococcus aureus V8 serine prot 
ease gene. NID: g46686. 

atgaaaaagagatttttatctatatgtacaatgacaattgcagcgttagcaactactaca 
atggtaaatacttcttatgcaaaaaccgatacagaaagccataatcattcctcacttggc 

10 acagaaaacaaaaatgttttagatattaatagttcgagtcataatatcaaaccaagtcaa 
aataaaagttacccaagtgtaatattacctaataataatagacatcaaatttttaatact 
acacaaggtcattatgatgctgttagttttatttatataccaatagatggtggatatatg 
agtggttcaggtgttgttgtaggtgaaaatgaaatattaactaataaacacgttgttaat 
ggagctaagggtaatccaagaaatattagtgtccatccttcagctaaaaatgaaaatgat 

15 tatcctaatggcaaatttgtgggtcaagaaatcataccgtatcctggtaatagtgattta 
gcaatcttaagagtgtcaccaaacgaacataatcaacatattggtcaagtagttaaacct 
gcaactataagtagcaatacagacactagaattaatgaaaacatcactgttactggttac 
cctggtgacaaaccattagccacaatgtgggaaagtgtaggtaaagttgtctatattggt 
ggcgaggaattaagatatgacctaagtactgtaggtggaaactctggatctccagtattt 

20 aatggtaaaaatcaagttattggaatacattatggtggcgtagataataaatacaatagc 
agtgtttatattaatgatttcgttcaacaattcctaagaaacaatatacctgatataaat 
attcagtaa 

Sequence 364 

25 MKKRFLSICTMTIAALATTTMVNTSYAKTDTESHNHSSLGTENKNVLDINSSSHNIKPSQ 
NKSYPSVILPNNNRHQIFNTTQGHYDAVSFI YIPIDGGYMSGSGVVVGENEILTNKHVVN 
GAKGNPRNISVHPSAKNENDYPNGKFVGQEIIPYPGNSDLAILRVSPNEHNQHIGQVVKP 
ATISSNTDTRINENITVTGYPGDKPLATMWESVGKVVYIGGEELRYDLSTVGGNSGSPVF 
NGKNQVIGIHYGGVDNKYNSSVYINDFVQQFLRNNIPDINIQ* 

30 

Sequence 365 

Contig_04 70_pos_14 4 67_1367 9, 

is similar to (with p-value 2.0e-43) 

>gp:gp|U56999|TPU56999_l Treponema pallidum methyl-accepting 
35 chemotaxis protein (mcp-1) gene, complete cds f and potentia 
1 regulatory molecule (pfoS/R) gene, partial cds . NID: gl354 
774. 

atgagtcattcgcaaataaatggaaagcgattttttaacaacattttgaatgcagtagga 
gcaggggtagttattgcactgttacctaatgccttattaggtgaattattaaaattcttc 

40 aaagaaggtaatcatgtactagaaacgatttttcagctagtaacaatcatacaatctttt 
atggcttttattataggggttcttgctgcgcaccaatttaaatttaaaggtacaggtgct 
gcaattattggtatttcagcaatgctaggttctggagctgtacactataatggacaaaca 
attgaattaaaaggaattggagatattataaatgtaattttagtagttatattagcgtgt 
ttcatttatatgtttttagaggggaaattaggttccttagaaatgattattttacccgtt 

45 ttagttcctgtaattagtggattaatagggttattaacattaccttacgttcaagttatt 
acgcagtcactaggaaaattagtaaacaggtttacagaattaaatccattattaatgtct 
atattaatttgtgtaacattttctttattaatggtaactccaatctcgttagttgctata 
gcaacagcaattaaccttactggtttaggaagtggtgctgcaaatatgggaatagttgca 
gcttgtgtaacctttttatttggatctttaagagttaattctcttggagttaacgtggta 

50 ttactcataggtgctgctaaaatgatgattcctgtgtacttaaagcaaagcaaacataca 
ttcaattag 

Sequence 366 

MSHSQINGKRFFNNI LNAVGAGVVIALLPNALLGELLKFFKEGNHVLETI FQLVTI IQSF 
55 MAFIIGVLAAHQFKFKGTGAAIIGISAMLGSGAVHYNGQTIELKGIGDIINVILVVILAC 
FIYMFLEGKLGSLEMIILPVLVPVISGLIGLLTLPYVQVITQSLGKLVNRFTELNPLLMS 
I LICVTFSLLMVTPI SLVAI ATAI NLTGLGSGAANMG I VAACVTFLFGSLRVNSLGVNVV 
LLIGAAKMMIPVYLKQSKHTFN* 



95 



WO 01/34809 



PCT/US00/30782 



Sequence 367 

Contig__04 70_pos_1054 5_9307, 
5 is similar to (with p-value 3.0e-44) 

>gp:gp| AL023702 |SC1C3_2 Streptomyces coelicolor cosmid 1C3. 
NID: g3169026. 

gtggcaggtcttgtagcctttacttatgcagaaatggcatctacaatgccttttgctggt 
tcagcttattcatggattaatgttctttttggtgaattattcggatgggttgccggttgg 

10 gcgcttttagcagaatactttattgctgttgctttcgttgcttcaggcttttctgctaac 
ttaagaggtcttattgcaccattgggcatttctttacctaaatcattatctaatccatt t 
ggaagtaacggtggtgtcattgatatcattgctgctgtagtgattattttaactgcatta 
ctattatcacgcggaatgaacgaagccgctcgtatggaaaatgtattggttatattaaag 
gtgttggcca teat tttatttgtgattgttgggctaactgcgattaatttcagtaac tat 

15 ataccttttattccagaacataaagttactgaaactggcgactttggaggttggcaaggt 
atttatgctggagtttcaatgatttttttagcttatattggttttgactctattgctgct 
aattcagctgaagcgattaatccacagaagacaatgcctagaggaatcttagggtcactc 
atagtagcaattgtattgtttgtggccgtagcacttgttcttgttggcatgttccactac 
tctcaatacgctgataatgcagagccagtaggt tgggcattacgagaaagtggtcatggt 

20 attattgctgcaattgttcaagcaatttctgtcatcggtatgttcactgcattaatcggt 
atgatgcttgcaggttcacgtctattatattcatttggacgagatggtttactcccttct 
tggttaagtcaattgaatcacaaacatttacctaatcgagcacttgtcatacttacaatc 
attggcgtagttatcggatcaatgttcccgtttgctttcttagcacaattgatttccgca 
ggtacccttgttgcattcatgtttgtgtcactagcaatgtatcgattaagaaaacgtgaa 

25 gggaaagatttacctaagccagagtttaaattacctttatatcctattttgcctgcaatt 
acatttatattagtattgctagtattttggggattaagttttgaagctaagttgtataca 
ctgatatggt ttat tgtaggtataattat ttat ttaatttatggaattagacattccaaa 
aagaatgatgaagaagcgtatcaagtacctagagaataa 

30 Sequence 368 

VAGL V AFT Y AEMAS TM P FAGS A Y S W I N VL FG E L FGW VAG WALLAE Y F I AVA FVASG FS AN 
LRGLIAPLGISLPKSLSNPFGSNGGVIDI IAAVVI ILTALLLSRGMNEAARMENVLVILK 
VLAIILFVIVGLTAINFSNYI PFI PEHKVTETGDFGGWQGI YAGVSMI FLAYIGFDSIAA 
NSAEAINPQKTMPRGILGSLIVAIVLFVAVALVLVGMFHYSQYADNAEPVGWALRESGHG 

35 IIAAIVQAISVIGMFTALIGMMLAGSRLLYSFGRDGLLPSWLSQLNHKHLPNRALVILTI 
IGWIGSMFPFAFLAQLISAGTLVAFMFVSLAMYRLRKREGKDLPKPEFKLPLYPILPAI 
TFILVLLVFWGLSFEAKLYTLIWFIVGI IIYLI YGIRHSKKNDEEAYQVPRE* 

Sequence 369 
40 Cont i g_0 4 7 0_pos_7 5 7 6_7 14 5, 

putative peptide of unknown function 

atgcattggttaaaaatattttatcatt tattatgcgcaaccacgattagcgtgatatta 
cttattataactatattaatggatgcgttactacaaaacacacacttaactcagttatta 
ctcaatattgattttttaatcaaccctgatgaagtgccaacaattattgaagtactgatt 
45 catttaagtattggaatattgatttatctcgcctttttaattatctatcattattcaaaa 
tccttgtatcatctagcatacttacctttagtattgatatttactttgatgtatccactt 
ctcgtttttcttgcgcaacgtccatttttttcctttagttggaacgaatttgcatggtgg 
ttagttgcacatctttttttcatcattttaatggcgacttgtctacctatcatttcgaaa 
aaaattttatga 

50 

Sequence 370 

MHWLKIFYHLLCATTISVILLIITILMDALLQNTHLTQLLLNIDFLINPDEVPTIIEVLI 
HLSIGILIYLAFLII YHYSKSLYHLAYLPLVLIFTLMYPLLVFLAQRPFFSFSWNEFAWW 
LVAHLFFIILMATCLPI ISKKIL* 

55 



Sequence 371 

Contig_0470_pos_4 27 5_31S4 / 
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is similar to (with p-value 5.0e-41) 

>gp:gp| D85752 | D85752_9 Enterococcus faecalis plasmid pPDl ba 
cA, bacB, bacC, bacD, bacE, bacF, bacG, bacH and bad genes, 
complete cds . NID: g2879906. 
5 atgattggaataattattggtattgcatcagtaattactatcatgtcgttggggaacggt 
tttaagaagtcaacgactgagcaattcaatgatgctggtgctggtaaaaatcaagcttca 
atttcttacatgacagaaaatatggaagcgcctaaaaataatccatttaagcaagaggat 
atgagtgttgttgaacaggttaatggtgttaagagtgctaaagtaaaagaggataaggat 
agcacatattcagtcaaaattacgaatacacatggcagtagtgatgctagtttaaaaaag 

10 gttgataaactgacagatgtagatgaaggaaaaggatttacgaatgatgataatgaagtg 
ctagaaaaagtagccgttatagataaaaaaattgctaaaaaagtattcaataatcaggca 
atgggtcaatctatttatataaatggagaagggtttaaagtcgtaggcgtctctgaaagc 
tcagaagtcgatgaaagtgggatgcctattgagtcattaattcaaataccttcaaaaaca 
tttaataaatatatgggcaatttgacacaaggtatgcctcaattattagttacagttgaa 

15 aaaggttcagataagaaagacgtaggtaaaaaggtcgaaaaagtgttgaataaaaaagga 
actggcgtatctgaaggtcaatatagttatgaagataatgaagcggtgatgaaaacgata 
ggttcagtcttagacacgattacttactttgtcgcagctgttgcgggaatatcactcttt 
attgcaggtattggtgtgatgaatgtcatgtatatttcagtcactgaacgaacagaagag 
attgcaattcgtcgtgcatttggcgctaaaggtcgagatattgaaatacaattcttagta 

20 gaaagtgttgtgttatgtctcataggtggtatcatcggattaattctaggtattattatt 
gctacattgattgatctcgtgacacctgaaatggttaagagttccgtcagtctaggttcc 
gtcatcctagctgtaggtgtatcaacattgataggcatcattttcggttggatacctgca 
cgttcagcttctaaaaaagaattaattgatattattaaataa 

25 Sequence 372 

MIGIIIGIASVITIMSLGNGFKKSTTEQFNDAGAGKNQASISYMTENMEAPKNNPFKQED 
MSVVEQVNGVKSAKVKEDKDSTYSVKITNTHGSSDASLKKVDKLTDVDEGKGFTNDDNEV 
LEKVAVIDKKIAKKVFNNQAMGQSI YINGEGFKVVGVSESSEVDESGMPIESLIQI PSKT 
FNKYMGNLTQGMPQLLVTVEKGSDKKDVGKKVEKVLNKKGTGVSEGQYSYEDNEAVMKTX 

30 GSVLDTITYFVAAVAGISLFIAGIGVMNVMYISVTERTEEIAIRRAFGAKGRDIEIQFLV 
ESVVLCLI GGI IGLILGI I IATLI DLVTPEMVKSS VSLGSVILAVGVSTLIG I I FGWI PA 
RSASKKELIDI IK* 

Sequence 373 
35 Cont ig_0 4 7 l_pos_5 63_1 228, 

is similar to (with p-value 3.0e-83) 

>gp:gp| AF068904 |AF068904_2 Staphylococcus aureus cell divisi 
on protein FtsZ (ftsZ) gene, partial cds; YlmD (ylmD) , YlmE 
(ylmE), YlmF (ylmF) , YlmG (ylmG) , and YlmH (ylmH) genes, com 
40 plete cds; and cell division protein DivIVA (divIVA) gene, p 
artial cds. NID: g4009490. 

atggcgagatatatcagtgacagtgcacatcatattacacatcatcaagatatcttagcg 
aatcttattggttatccaagagatgaatgggtttttcctatacaaacacatgatagtcgt 
atcgttgaagttacaagtgaacataaaggaacaaatattgatgaactaactgatgattta 

45 ca t ggca taga tggaat gta tact tttgattctcacattcttct tact a tgtgtt a tgcg 
gattgcgtacctgtatatttttatagtgaaccacatggatatataggattagcacatgca 
ggttggcgaggaacatatggtcaaatagtaaaagaaatgctaaaaaaagtggattttgat 
tatgaagacttaaagattgtaattggtccagcaacttcaaattcttatgaaatcaatgat 
gatataaaaaataagtttgaggaattaaccattgattcaactttatatattgagaccaga 

50. ggtaaaaatcaacatggtattgatttgaaaaaggctaacgcacttcttctagaagaagct 
ggagttccatcaaaaaacatatacgttacggaatatgcaacttcagaaaacttagattta 
ttcttttcatatcgtgttgaaaaaggacagacgggacgtatgttagcatttattggacgg 
aagtaa 

55 

Sequence 374 

MARYISDSAHHITHHQDILANLIGYPRDEWVFPIQTHDSRIVEVTSEHKGTNIDELTDDL 
HGIDGMYTFDSHILLTMCYADCVPVYFYSEPHGYIGLAHAGWRGTYGQI VKEMLKKVDFD 
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YEDLKIVIGPATSNSYEINDDIKNKFEELTIDSTLYIETRGKNQHGIDLKKANALLLEEA 
GVPSKNI YVTEYATSENLDLFFSYRVEKGQTGRMLAFIGRK* 

Sequence 375 
5 Contig_04 71_pos_1254_1922, 

is similar to (with p-value 2.0e-92) 

>gp:gp|AF068904 | AF068904_3 Staphylococcus aureus cell divisi 
on protein FtsZ {ftsZ} gene, partial cds; YlmD (ylmD) , YlmE 
(ylmE), YlmF (ylmF), YlmG (ylmG), and YlmH (ylmH) genes, com 
10 plete cds; and cell division protein DivIVA (divIVA) gene, p 
artial cds. NID: g4009490. 

atggatgttaaagagaatcttgctaagattgaaaaggaaattgatgcaagcattaaaaaa 
agtgcgcattcagcacaacctcacgtgattgcagtaacaaaatatgttacaatagagcga 
gctagagaagcgtataaagtagggataagacatttcggtgaaaatcgattagatggattc 

15 aaagagaagaaagaatctctaccaagcgatgttaaattacatttcattggttctttacaa 
tcaaggaaagtaaaagatattataaatgaagtcgattattttcatgctttagatcgttta 
agtctagctaaggagattaataaaagagcaaatcatgttataaaatgtttcttacaagta 
aatgtttctggagaagaatctaaacatggcatagctcttgaagaagtgaatcaatttata 
aatcaaattaaagaatatgaaaatatccaaattattggattaatgacgatggcaccattg 

20 actgatgatttatcgtacataagaaatttatttaaagaattaagacataaaagaaatgaa 
attcaacaattcaatttagcacatgcaccttgtacagaattatctatgggaatgagtaat 
gattatcaaatggcagttgaagaaggtgcaacctttgtcagaattgggactaaacttgta 
ggagaatag 

25 Sequence 376 

MDVKENLAKIEKEIDASIKKSAHSAQPHVIAVTKYVTIERAREAYKVGIRHFGENRLDGF 
KEKKESLPSDVKLHFIGSLQSRKVKDIINEVDYFHALDRLSLAKEINKRANHVIKCFLQV 
NVSGEESKHGIALEEVNQFINQIKEYENIQIIGLMTMAPLTDDLSYIRNLFKELRHKRNE 
IQQFNLAHAPCTELSMGMSNDYQMAVEEGATFVRIGTKLVGE* 

30 

Sequence 377 

Contig_04 71_pos_2176_2529, 

is similar to (with p-value 5.0e-45) 

>gp:gp| AF068904 | AF0689044 Staphylococcus aureus cell divisi 
35 on protein FtsZ (ftsZ) gene, partial cds; YlmD (ylmD), YlmE 
(ylmE), YlmF (ylmF), YlmG (ylmG), and YlmH (ylmH) genes, com 
plete cds; and cell division protein DivIVA (divIVA) gene, p 
artial cds. NID: g4009490. 

atgaataataattcaaaaaataattctagaaacgttgtaacaatgaaccaagcatcacaa 
40 tcatatgccgctcaggaaagttcaaaaatgtgtctgtttgaaccacgtgtcttttcagat 
actcaagatattgccgacgaattaaaaaacagacgtgcaactttagtaaatttacaacgc 
attgatcaagtatcagcaaagcgtattattgattttttaagtggtacggtatacgcaatt 
ggtggagatattcaacgcgtgggtactgatattttcttatgcacacctgataatgttgaa 
gtagccggtagtataactgatcacatcgagaatatggagcaacactacgaataa 
45 " 

Sequence 378 

MNNNSKNNSRNVVTMNQASQSYAAQESSKMCLFEPRVFSDTQDIADELKNRRATLVNLQR 
IDQVSAKRIIDFLSGTVYAIGGDIQRVGTDIFLCTPDNVEVAGSITDHIENMEQHYE* 

50 Sequence 379 

Contig_04 71_pos_3052_4 308, 

is similar to (with p-value 1.0e-39) 

>gp:gp|AF015775|AF015775_17 Bacillus subtilis YodA (yodA) , Y 
odB (yodB), YodC (yodC) , YodD (yodD) , ABC-transporter (yodE) 
55 , permease (yodF) , proteinase (ctpA) , YodH (yodH), YodI (yod 
I), carboxypeptidase (yodJ) , purine nucleoside phosphorylase 
(deoD), YodL (yodL) , YodM (yodM) , YodN (yodN) , YodO (yodO) , 
YodP (yodP), acetylornitine deacetylase (argE) , butirate-ac 
etoacetate CoA transferase (yodR), butyrate acetoacetate-CoA 



98 



WO 01/34809 



PCT/US00/30782 



transferase (yodS) , YodT (yodT), CgeE (cgeE), CgeD (cgeD), 
CgeC (cgeC), CgeA (cgeA) , CgeB (cgeB) , YzxA (yzxA), UDP-gluc 
ose epimerase (yodU), YodV (yodV) , and YodW (yodW) genes, co 
mplete cds; and YodZ (yodZ) gene, partial cds . NID: g2415383 
5 . >gp:gp| Z99114 |BSUB0011_133 Bacillus subtilis complete geno 
me (section 11 of 21): from 2000171 to 2207900. NID: g263423 
0. 

atgggatatcaattggataaacgtcaatttgaaattttagatatgctagtgagatttaat 
actgaaagtccacctggacgtaatacagatccattgcaagatgaaatcgaaacgttactt 

10 aaacaactggatttttcaatacagagagaacagttatacgacaatgatagtgtgatagta 
gctaccttaaaagggcacaatcctaaagcgccaaaactgatattgaatggacatgttgat 
gtagcttctgtagatgacgatcaatattggcagtatccaccttttaaacttaccaacaaa 
gaagaatggttatacggtcgtggcgttagcgatatgaaaggtggtatgtcttcattattc 
tacgtcttggagcaattacatcaagaggggcaacgtccagaaggtgatattattgttcaa 

15 tcagtagtcggtgaagaagtaggtgaagcaggaactaaacgtgcatgtgaaataggacct 
aaaggtgacttagcccttgtcttagatacgagtgagaatcaagcacttgggcaaggtggc 
gtgattaccggatggattacagttaaaagtaaaaatacaatacatgatggtgcgcgtagt 
caaacgatacatgctggtgggggcttgtttggtgcaagtgccattgaaaaaatgacaaag 
gtgattcaatcgcttaatgaacttgaaaggcattgggctgtcatgaagaagagccctgga 

20 atgcctccaggtgcgaatacaattaacccagctgtcatagaaggtggacgtcaccctgca 
tttattgcagatgaatgtcgattatggattactgttcattacttaccgaacgaaagttat 
gaatctgtagttaatgaaatagagcaatatttaaataaggttgcagaagcagatgtatgg 
ctcagagagaatccacttgaatttgaatggggtggtacatccatgattgaggataaagga 
gaaatcttcccaagtttcactgttccgacacatcatccaggttttaagcaattagaagaa 

25 gcacatgaacatattcataataaaaagcttgaacatggtatgagtacaactgtaactgat 
ggaggttggacagcacattttggcattcccacgatattatatggcccaggtagtttagaa 
gaggcacatagtgtagatgagaaaataaaagcaaaggaattagctcaatatagtgatgtt 
ttatatacatttttaaaagagtggtatgcacacccacaatcctataaatcatcatag 

30 Sequence 380 

MGYQLDKRQFEILDMLVRFNTESPPGRNTDPLQDEIETLLKQLDFSIQREQLYDNDSVIV 
ATLKGHNPKAPKLILNGHVDVASVDDDQYWQYPPFKLTNKEEWLYGRGVSDMKGGMSSLF 
YVLEQLHQEGQRPEGDI IVQSWGEEVGEAGTKRACEIGPKGDLALVLDTSENQALGQGG 
VITGWITVKSKNTIHDGARSQTIHAGGGLFGASAI EKMTKVIQSLNELERHWAVMKKSPG 

35 MPPGANTINPAVIEGGRH PAFIADECRLWITVH YLPNESYESVVNEIEQYLNKVAEADVW 
LRENPLEFEWGGTSMIEDKGEI FPSFTVPTHHPGFKQLEEAHEHIHNKKLEHGMSTTVTD 
GGWTAH FGI PT I LYG PGSLEEAHS VDEKI KAKELAQYSDVLYTFLKEWYAHPQS YKSS * 

Sequence 381 
40 Con t ig_0 4 7 l_pos_8 5 5 2_0 , 

is similar to (with p-value 4.0e-39) 

>sp:sp| P39640I YWFD_BACSU HYPOTHETICAL OXIDOREDUCTASE IN ROCC 
-PTA INTERGENIC REGION (EC 1.-.-.-). >pir : pir I S3 9737 | S3 97 37 
hypothetical protein - Bacillus subtilis >gp: gp I X73124 | BSGEN 

45 R_83 B. subtilis genomic region (325 to 333). NID: g413923. > 
gp:gp| Z99123 | BSUB0020_68 Bacillus subtilis complete genome ( 
section 20 of 21): from 3798401 to 4010550. NID: g2636240. 
atggaacgtttagaaaacaaaatcgcagtgattactggtgcgagtactggtattggacaa 
gcatcggccgtggcgttagcaaaagaaggagcacatgtgttagcgcttgatatatcagat 

50 caattagaagaaactgtgcagtctattaatgataatggtgggaaagcaactgcatatcgc 
gtagacatttcagatgataaacaagtcaaacaattctcagaaaaaatagcacaagaattt 
ggacatgtagatgttatttttaacaatgcgggtgtagataatggcgccggacgtattcat 
gaatatccagt tgaagtgtttgataaaattatggctgttgatatgagaggaactttjtt ta 
gtaactaaatttttattacctttaatgatgaaacaaggtggttctattattaatacagct 

55 tcattctctgggcaagctgcggatttataccgttcagggtataatgctgctaagggcggt 
gtcattaattttacaaaatctatcgctatagaatatggacgtgaaaatattcgtgctaat 
gctatagcacctggaacaatcgaaacaccacttgttgataatttagcaggtacatcagat 
gaagaagccggacaaacattccgagaaaatcaaaaatgggtaacaccattaggtcgacta 
ggaacaccggatgaagttgggaaacttgtagcctttttagcttccgatgatagttcattt 
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ataactggtgaaactattcgtattgatggtggcgtgatggcttatacatggctacaacac 
gcattttcattttttgtcgttgttttttcttattttcttgtgacgtatattggttaccga 
tatgttctacttttttattcattgcttgtcacctccttaagcatttcactcttcattaat 
acgttcttctttaatc 

5 

Sequence 382 

MERLENKIAVITGASTGIGQASAVALAKEGAHVLALDISDQLEETVQSINDNGGKATAYR 
VDISDDKQVKQFSEKI AQEFGHVDVIFNNAGVDNGAGRIHEYPVEVFDKIMAVDMRGTFL 
VTKFLLPLMMKQGGSIINTASFSGQAADLYRSGYNAAKGGVINFTKSIAIEYGRENIRAN 
10 AIAPGTIETPLVDNLAGTSDEEAGQTFRENQKWVTPLGRLGTPDEVGKLVAFLASDDSSF 
ITGETIRIDGGVMAYTWLQHAFSFFVWFSYFLVTYIGYRYVLLFYSLLVTSLSISLFIN 
TFFFNX 

Sequence 383 
15 Contig_0472_pos_3883_354 8, 

putative peptide of unknown function 

atggtgctcgtacaatttccaccttggtttgattgtaacgtccaaaatataaattacatc 
ttatatgtgagaaaacaattaactgatattccgatgagcattgaatttagacatcaatca 
tggtttgacaatcagtataaagaacaaactttatccttcttaacacaacatcaaatcatt 
20 catgcagtggtagatgaacctcaagttaaagaggggagcgttcctttagtaaataggatt 
actagtgaaattgcttttgtacgttatcatggacgtaatcattatggttggactaaaaaa 
gatatgactgatcaagaatggcgagatgtaagataa 

Sequence 384 

25 MVLVQFPPWFDCNVQNINYILYVRKQLTDI PMSI EFRHQSWFDNQYKEQTLSFLTQHQI I 
HAVVDEPQVKEGSVPLVNRITSEIAFVRYHGRNHYGWTKKDMTDQEWRDVR+ 

Sequence 385 

Contig_0473_pos_900_2051, 

30 is similar to (with p-value 8.0e-75) 

>sp : sp | Q01 4 4 4 1 YFF1_MYCMY HYPOTHETICAL PROTEIN IN FFH 5 ' REGIO 
N { FRAGMENT) . >pir : pir I S354 80 I S354 80 hypothetical protein 1 
- Mycoplasma mycoides (SGC3) >gp: gp | M91593 I MYCSRPM54A_1 Myco 
plasma mycoides SRPM54 gene, complete cds. NID: gl50208 . 

35 atgaagcatgaacaagaattagaacgcatctccggtctcactcaagaagaagctgtgaaa 
gaacagcttcaaagagttgaagaagaactgtcacaagatattgcaatacttgt taaagaa 
aaagaaaaagaagcgaaagaaaaagttgataagacagctaaagaattacttgctacaact 
gtacaaagattagcagccgaacatacaactgaatcaactgtttcagtcgtaaatctgcct 
aacgatgaaatgaaaggtcgtatcataggtagagaaggtagaaatatacgcacattagaa 

40 acacttactggcatagatttaattattgatgacacaccagaagcagttattttatcaggt 
tttgacccaattagacgtgaaattgctagaactgcactagttaatttggtttctgatgga 
cgtattcatcctggacgtattgaagatatggtcgaaaaagctagaaaggaagtagacgat 
atcattagagatgctggagaacaagctacctttgaaataaatgtacacaatatgcatcct 
gatttagtgaaaattttgggtcgattaaattatcgaactagttatggtcagaatgtactt 

45 aaacattcaattgaagttgcccacctttcaggtatgcttgcagcagaattaggagaggat 
gttactttagctaaacgtgctggattattacatgatgt tggtaaagccattgatcatgaa 
gttgaaggtagtcacgtagaaataggtgttgaattagctaagaaatataatgaaaataac 
ataattattaatgctattcactcacatcatggtgatgttgaaccaacctctatcatttct 
attttagttgcagcagctgatgcattatcagcagcgcgaccaggtgcacgtaaagagaca 

50 cttgaaaattatattagaagacttgagagactcgaaacgttatctgaaagttatgatggg 
gtagaaaaagcatttgctatacaagctggtagagagattcgtgtagtcgtctcacctgaa 
gaaattgatgatt taaaatcatatagattggcaagagatattaagaaccaaattgaagaa 
gagttacaatatcctggacatatcaaagtgacagttgttcgagagactagagcaatagaa 
tatgctaaataa 

55 

Sequence 386 

MKHEQELERI SGLTQEEAVKEQLQRVEEELSQDI AI LVKEKEKEAKEKVDKTAKELLATT 
VQRLAAEHTTESTVSWNLPNDEMKGRIIGREGRNIRTLETLTGIDLIIDDTPEAVILSG 
FDPIRREIARTALVNLVSDGRIHPGRIEDMVEKARKEVDDIIRDAGEQATFEINVHNMHP 
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DLVKILGRLNYRTSYGQNVLKHSIEVAHLSGMLAAELGEDVTLAKRAGLLHDVGKAIDHE 
VEGSHVEIGVELAKKYNENNIIINAIHSHHGDVEPTSIISILVAAADALSAARPGARKET 
LENYIRRLERLETLSESYDGVEKAFAIQAGREIRVVVSPEEIDDLKSYRLARDIKNQIEE 
ELQYPGHIKVTVVRETRAIEYAK* 

5 

Sequence 387 

Con t ig_0 4 7 3_pos_2 5 1 9_3 313, 

is similar to (with p-value 4.0e-37) 

>sp:sp| P47488 I Y246_MYCGE HYPOTHETICAL PROTEIN MG246. >pir:pi 

10 r I B64227 IB64227 hypothetical protein MG246 - Mycoplasma geni 
talium (SGC3) >gp:gp| U39703 I U39703_14 Mycoplasma genitaliurn 
section 25 of 51 of the complete genome. NID: g3844835. 
atgagaatattgtttataggtgacatcgttggtaaagtgggcaggaaaatgattactact 
tatttacctaaaattaaacaaacttatcacccaacagtttctatagtaaacgctgaaaat 

15 gccgcacacggtaaaggattaacagaaaaaatttacaaacaacttttgagagaaggcgtg 
gatttcatgactatgggtaatcatacatatggtcaaagagaaatttacgattttattgat 
gatgctcatcgaatggtgagacctgcaaattttcctgatgaagctccaggaacaggtatg 
agaataataaaaattaacgatattaaattggctattattaatttacaaggccgttcattt 
atgcaagacattgatgatccatttaaaaaggctgaccagctaatcgaagaagctcaaaaa 

20 tctacaccatatatatttgtagattttcatgctgaaactacatctgaaaaaaatgctatg 
ggttggtatttagatggtagagtgagcgctgttgttggtactcacacacatattcaaact 
tctgatgatcgtatattacctcatggcacaggatatatcacagatgtcgggatgacaggt 
tattacgatggtattttaggtatcaatagagatgaagttattcaacgttttattactagt 
ttgccacaaaggcatgttgttccagatgatgggcgaggcgtattatcaggagttatcata 

25 gatttagataaagaaggtaaaacgactcaaataaaaagactgttaataaatgaggaccat 
cctttccaaatttaa 

Sequence 388 

MRILFIGDIVGKVGRKMITTYLPKIKQTYHPTVSI VNAENAAHGKGLTEKIYKQLLREGV 
30 DFMTMGNHTYGQREI YDFIDDAHRMVRPANFPDEAPGTGMRIIKINDIKLAIINLQGRSF 
MQDIDDPFKKADQLIEEAQKSTPYI FVDFHAETTSEKNAMGWYLDGRVSAVVGTHTHIQT 
SDDRILPHGTGYITDVGMTGYYDGILGINRDEVIQRFITSLPQRHVVPDDGRGVLSGVII 
DLDKEGKTTQIKRLLINEDHPFQI* 

35 Sequence 389 

Contig_04 73_pos_3372_5186, 

is similar to (with p-value 4.0e-83) 

>pir:pir | S22396 | S22396 pyruvate synthase (EC 1.2.7.1) - Halo 
bacterium halobium >gp: gp I X64521 | HHFER0XI_1 H.halobium gene 

40 for pyruvate : ferredoxin oxidoreductase . NID: g43497. 

gtggtttattttgttatcatagagtatgaattaataccacaggaggcatgtgatatgaaa 
tcacaaatatcatggaaagtgggcggtcagcaaggcgaaggtattgaatctaccggtgaa 
atctttgctactgcgatgaatagaaaaggttattttttgtatggatatagacacttttct 
agtcgtataaaaggtggccatactaataataagataagagtttcaaaatcgcctgtgcat 

45 gcgattagtgatgatttggatatactcattgcttttgaccaggaaacgattgaattaaat 
catcatgaaatgagagaagatagtattataattgcggatgctaaagcaaaaccccaaaag 
ccagagaactgtgtggctcaattaattgagttaccattcactagcacggcaaaggaactt 
ggaacagcattaatgaagaatatggtggcaattggtgcgacatctgcactgatggattta 
aatacatcaacttttgaaactttaatcgataacatgttttcaaaaaaaggtaataaagtc 

50 gttgatatgaatatacaagcccttaatatgggttatgatttaatgaagcaacaagttacc 
aacgttaatggagactttacattagagaatggtagcggtcatcctcatttatatatgata 
ggtaatgacgcaatcggattaggagcaatagcagctggatcaagatttatgtccgcttat 
cca a t tacgcca get tctgaaattatggaatacat gat tgccaatctacctaaagtt gat 
ggtactgttgttcaaactgaagatgaaatagcagcagcaacgatggcgattggagctaac 

55 tatgctggcgtacgaggctttacagcgagtgcgggtccaggtctttctttaatgatggaa 
tctattggattgtctggtatgactgaaacgccattagtcattattaatactcaaagaggt 
ggtccttctactggcttaccaacaaagcaagaacaatcagatttaatgcaaatgatttat 
ggtacccatggtgatattccgaaaattgtcgttgctcctacagatgctgaagatgcgttt 
tatcttactatggaagcatttaatttagctgaagaataccaatgtccagtcattctgtta 
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agtgatttacaattatcattaggaaaacaaactgttaaaacactcgattataataaaatc 
gatattcgtcgtggagaaataatacagtcagatatcgagagagctgaagatgataaagca 
tactttaaaagatatgcattaacagctagtggcgtatcaccacgaccaataccaggtgtt 
aaaggtggtatacatcatgtaacaggtgttgaacataatgaagaagggaagccaagtgag 
5 gcgcctatgaatcgtcagaatcagatggaaaaacgaatgcgcaaaactgaaagcttggtt 
atcaataatcctgtgttactcaatgaacatgaagacgaagcagatatactgtatatagga 
tttatatctactaaaggtgctattggagaaggtgcagaaagactagaacgacatggtgta 
aaagtgaatacgatgcatattcgacaattacatcctttccctaaagatattgttcaacaa 
gctattaataaagcttcgaaagtaatagttgcagaacataattatcaaggacaattatca 
10 agtattttaaaaatgaacacacaagttaatgataaattagttaatcaaacaaaatacgat 
gggaaacctttcttaccttatgaaattgaagaaaaaggtttggaaattgctaaagagtta 
aaggagttggtgtaa 

Sequence 390 

15 VVYFVIIEYELIPQEACDMKSQISWKVGGQQGEGIESTGEIFATAMNRKGYFLYGYRHFS 
SRIKGGHTNNKIRVSKSPVHAISDDLDILIAFDQETIELNHHEMREDSIIIADAKAKPQK 
PENCVAQLIELPFTSTAKELGTALMKNMVAIGATSALMDLNTSTFETLI DNMFSKKGNKV 
VDMNIQALNMGYDLMKQQVTNVNGDFTLENGSGHPHLYMIGNDAIGLGAIAAGSRFMSAY 
PITPASEIMEYMIANLPKVDGTVVQTEDEIAAATMAIGANYAGVRGFTASAGPGLSLMME 

20 SIGLSGMTETPLVIINTQRGGPSTGLPTKQEQSDLMQMI YGTHGDIPKI VVAPTDAEDAF 
YLTMEAFNLAEEYQCPVILLSDLQLSLGKQTVKTLDYNKIDIRRGEIIQSDIERAEDDKA 
YFKRYALTASGVSPRPIPGVKGGIHHVTGVEHNEEGKPSEAPMNRQNQMEKRMRKTESLV 
INNPVLLNEHEDEADILYIGFISTKGAIGEGAERLERHGVKVNTMHIRQLHPFPKDIVQQ 
AINKASKVIVAEHNYQGQLSSILKMNTQVNDKLVNQTKYDGKPFLPYEIEEKGLEIAKEL 

25 KELV* 

Sequence 391 

Contig_04 7 3_pos_6639_7256, 
is similar to (with p-value 2.0e-18) 
30 >pir:pir | S4 1182 | S4 1182 hypothetical protein 37.1 - phage SPP 
1 >pir :pir IS43808 IS43808 hypothetical protein 38 - phage SPP 
1 >gp : gp | X67865 I BSSPP1_10 B.subtilis phage SPP1 DNA sequence 
coding for products required for replication initiation. NI 
D: g472886. 

35 atggacaaatttaaatctatgacagaattaaaagaattgactaaagaaggaaaagat tgg 
gaaatagagtgtgaaaatcgttctagcatagtcactatattagcattacatggcggtgga 
attgaacctgccacaactgaattagcctatacaattgcacattgtggcgactataactat 
ttttcctttaaaggtatgagaagtaaggggaataatgagttacatgtgacttccacacat 
tatgatgaccaaattgcattagatttagtgagaggtagccaaagaactgtagccatccat 

40 ggttgtgaaggtaatgaaagtgtggcttatataggaggtagtgatgacagactaattgag 
ttaatcaccgaatctcttgaagatataggaattagcgtgcgagaagcaccacatcatatt 
tctggaactcaagaaaataatattgttaatatgactcaaacccaaggaggagtgcaatta 
gaactgacagctcagttaagaaaggagctatttaaaaatagaaaaagttcacgcaaaaac 
cgtgaaaataaagataattgggatgatttaatgtacgactttgctgatgcaatgaaaaaa 

45 gctatagaacgtgcataa 

Sequence 392 

MDKFKSMTELKELTKEGKDWEIECENRSSIVTILALHGGGIEPATTELAYTIAHCGDYNY 
FSFKGMRSKGNNELHVTSTHYDDQIALDLVRGSQRTVAIHGCEGNESVAYIGGSDDRLIE 
50 LITESLEDIGISVREAPHHISGTQENNIVNMTQTQGGVQLELTAQLRKELFKNRKSSRKN 
RENKDNWDDLMYDFADAMKKAIERA* 

Sequence 393 

Con t i g_0 4 7 3_pos_8 5 2 9_9 1 3 1 , 
55 putative peptide of unknown function 

atgaaaaatgtttctaaagctttgatttggtttgttataagcttcatcatctttcacgca 
atattatttgtgatgtggggagaacatcaagaatactggtatttatatactggcattatg 
ttaatagctggaataagttatgttttttaccaaagagacattgcatctaaacgattatta 
acttccataggcatgggtataataacgagtgtcgcacttattattatacaattaattttt 
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tcacttatttcatcagaattatcatacgcatctttaatcaaagaattatcacgaacgggt 
gtctactttaaatggcaaatgctcgttactttattatttgtgataccttgtcatgaatta 
tatatgagaactgttttacaaaaggaattaataaaatataacttaccgaaatgggctagc 
attttaattgttgcaatatgttcaagttcat tat ttata tact tagataattggtggatt 
5 gtattctttatttttgtagctcaattcattctatctcttagctatgaatatacgagacgt 
attgctacgactacaattggtcaaattgtggctatcattttattattgatattccacgga 
taa 

Sequence 394 

10 MKNVSKALIWFVISFIIFHAILFVMWGEHQEYWYLYTGIMLIAGISYVFYQRDIASKRLL 
TSIGMGI ITSVALI IIQLIFSLISSELSYASLIKELSRTGVYFKWQMLVTLLFVI PCHEL 
YMRTVLQKELIKYNLPKWASILIVAICSSSLFI YLDNWWIVFFIFVAQFILSLSYEYTRR 
IATTTIGQIVAIILLLIFHG* 

15 Sequence 395 

Contig_04 74_pos_2713_1850, 

putative peptide of unknown function 

gtgttaataatgaatgtcttccaaatgagagataaattgaaagcgcgtttaaaacattta 
gacgtagaattcaagtttgatagagaagaagaaacgttacgtat tgt aagaat tgacaat 

20 cacaaaggtgtaacgattaaacttaacgctatcgt cgcaaaatatgaagaacaaaaagaa 
aaaattatagatgaaatttgttattatgtcgaggaagcaatcgctcagatgggtgatgaa 
gtgattaataatgttgaggacatacaaattatgccggttataagagctacaagtttcgac 
aaagaaactaaggaaggtcatgcatttgtgttaacagaacatactgctgaaactaatata 
tattacgctcttgatctagggaaatcttatcggctaatagatgaaaatatgttacaaacg 

25 ttaaatttaactgctcaacaagtgaaagaaatgtcactatttaatgttcgtaagttagag 
tgtcgctatagtacggatgaagttaaaggtaatattttttacttcatcaacacaaatgat 
ggatatgatgcaagtcgtattttaaatacttcttttttaaatcatattcaacaccaatgt 
gaaggtgaaatgcttgttggtgtgccacatcaagatgtattaattcttgcagatattaga 
aataaaacaggttatgatgttatggctcatttgactatggaattctttactaaaggactt 

30 gttccgat tact tctt tat cat ttggttatgataacggacatctagagccaatatttatt 
ttggggaaaaataataaacaaaaaagagatcctaacgttattcaacgt ttagaagcgaac 
agaaaaaaattcaaaaaagattaa 

Sequence 396 

35 VLIMNVFQMRDKLKARLKHLDVEFKFDREEETLRI VRIDNHKGVTIKLNAIVAKYEEQKE 
KIIDEICYYVEEAIAQMGDEVINNVEDIQIMPVIRATSFDKETKEGHAFVLTEHTAETNI 
YYALDLGKSYRLIDENMLQTLNLTAQQVKEMSLFNVRKLECRYSTDEVKGNIFYFINTND 
GYDASRILNTSFLNHIQHQCEGEMLVGVPHQDVLILADIRNKTGYDVMAHLTMEFFTKGL 
VPITSLSFGYDNGHLEPIFILGKNNKQKRDPNVIQRLEANRKKFKKD* 

40 



Sequence 397 
Contig_0474_pos_0_1210, 

45 putative peptide of unknown function 

atgagctggtttgataaattatttggcgatgacaacggttcgaatgacgatttgttacgc 
aaaaataaaaatagacgtcagtctcagcaatcaaaacaaaataatcaagactcattactg 
cctcaaaataatgatatttatagtcgaccaagaggtaaatttagatttccaatacaagtt 
tctgaaaatgaatatacgcaaaaaaatgaaaattataatgaacataaccaagaagaaaca 

50 aacgatataatgagatcatataaccagcatgataatcctgaatttgattcttctggtaaa 
agacatcgacgccgacgccaagcgtattcaaaacacgatcaatctaagattacacaacaa 
aagcaatttgcagataacaattatacaaataataacagtgtttttaatcaaaacgacaat 
aagaaatcttcacaacaacgtaaatcaatacaatctgaaaatatcaaaaacaaagcaaac 
actaagaatacgtcgacatctcctgaatttacatatttaaatcatagttttaaatcaagc 

55 gaggtaccctcagcgatttttggtacaaaaaaacgaagaccgattgagaatggtgtcata 
ccgccagaacataaggaat taaatgataaagagattgttcaacaggatgaagtctcgcat 
tcaacgaaatcaatagatgcatcaaaaaatgtttctaatagtaacgataacaatattgaa 
aaaaatcaacagaaaaaacaacaaacaactgctcaaactgagtcatcatcagaaaatatg 
cataatgttgaaaagtcaaattatcaaactactaagcgtaaaacaccaaattactctaaa 
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gtagataatacgattaatattgaaaatatctatgcttcacaaattgtagaagaaatcaga 
agagaaagagaacgtaaagttctacagaaacgacgttttaagaaagccttacaacaaaaa 
cgtcaacaaaatcaacagtcagaagaggattcaattcaaaaagctattgatgaaatgtat 
gctaagcaagcccaacattacacaggcgaaagttcattggatttagaaaatgaaagtaat 
5 caagattcgtcatctaatagtctagagaaacaatcaaatagcagcaacattgacaataaa 
gaagcccaaaataacacacctttatttaactacgaagaaattgacttagatacgacatca 
gatgtTCTTC 

Sequence 398 

10 MSWFDKLFGDDNGSNDDLLRKNKNRRQSQQSKQNNQDSLLPQNNDI YSRPRGKFRFPIQV 
SENEYTQKNENYNEHNQEETNDIMRSYNQHDNPEFDSSGKRHRRRRQAYSKHDQSKITQQ 
KQFADNNYTNNNSVFNQNDNKKSSQQRKSIQSENIKNKANTKNTSTSPEFTYLNHSFKSS 
EVPSAIFGTKKRRPIENGVIPPEHKELNDKEIVQQDEVSHSTKSIDASKNVSNSNDNNIE 
KNQQKKQQTTAQTESSSENMHNVEKSNYQTTKRKTPNYSKVDNTINIENIYASQIVEEIR 

15 RERERKVLQKRRFKKALQQKRQQNQQSEEDSIQKAIDEMYAKQAQHYTGESSLDLENESN 
QDSSSNSLEKQSNSSNIDNKEAQNNTPLFNYEEIDLDTTSDVLX 

Sequence 399 

Contig_04 75_pos_677 3_7180, 
20 is similar to (with p-value 3.0e-46) 

>gp:gp| Y13384 |LLLNISZ_1 Lactococcus lactis nisZ gene and 3 O 
RF's. NID: g3157416. 

atgatgaaaaataaattaacattaaaagagaatctatttatcggctcaatgctgtttggt 
cttttttttggtgctggaaatctcatttttccaattcacttaggtcaaactgcgggggca 
25 aatgtatggaccgccaatttaggatttcttatcacggctatcggactaccttttttagga 
attatagcgataggtgtatctaaaacaaacggggtctttgaaatttcctcaaggataagt 
aaaatatatggttatttgttcacaattggcttgtatcttgttataggtccgttttttgcg 
ttgccaagacttgcgacgacgtcatttgaaatagcattttcaccatttatttcatctggt 
acggcccaagcgttgttgctatttttagtattttattcttcggagtag 

30 

Sequence 400 

MMKNKLTLKENLFIGSMLFGLFFGAGNLIFPIHLGQTAGANVWTANLGFLITAIGLPFLG 
IIAIGVSKTNGVFEISSRISKIYGYLFTIGLYLVIGPFFALPRLATTSFEIAFSPFISSG 
TAQALLL FLV FYSSE* 

35 

Sequence 401 

Contig_04 75_pos_7273_8133, 

is similar to (with p-value 1.0e-49) 

>sp:sp| P54104 | BRNQ_LACDL BRANCHED CHAIN AMINO ACID TRANSPORT 
40 SYSTEM CARRIER PROTEIN. >pir :pir | S60180 I S60180 branched-cha 
in amino acid carrier brnQ - Lactobacillus delbrueckii >gp:g 
p| Z4 8676|LDBRNQGN_1 L . delbrueckii brnQ gene for branched-cha 
in amino acid carrier. NID: g732812. 

gtgcttgcatttatccgtcctatgggtggaattagtcatgcgccagtaagtgctgattat 
45 agcaatagcgtgttactcaaagggtttatcgatggatataatacattagacgctttggca 
tcattagcatttggtattatcattgttactacaattaaaaagttggggattactaatccg 
aatacaatcgctaaagaaactttaaaatcaggtacgattagtattatagctatgggcgtt 
atttatactttattagctttaatgggtacgatgagtttaggtcgttttaaagtaagtgaa 
aatggtggtattgcgcttgctcagattgcacaacattatttaggggattacggaattatt 
50 attttgtcactaatcatcattgtggcatgtctgaaaacagcaataggattgatcacagcc 
ttttcggaaacatttacagagttattccctaaatctaactatctttggttagctactggg 
gtgagtatattagcttgtatatttgctaatgtaggtttaacaaaaattattatgtattca 
acaccagtgttgatgt tea tttatcctttagcgat tact ttaattttattagcatt act t 
agtccattatttaaacattctaaaattgtctatcgatttacaacattatttacaatggtg 
55 gcggcatttgtagatggtgtgaaagcaagtccagagttctttgttaatacaaaatttgca 
caaacaatcattggatttggtgaaaattatctcccattctttaacattggtatgggatgg 
attgttccagcacttattggtttcattattggtattattgtatactttatgactgctaaa 
aaatcgtcccacgtacaataa 
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Sequence 4 02 

VLAFIRPMGGISHAPVSADYSNSVLLKGFIDGYNTLDALASLAFGIIIVTTIKKLGITNP 
NTIAKETLKSGTISIIAMGVIYTLLALMGTMSLGRFKVSENGGIALAQIAQHYLGDYGII 
ILSLII IVACLKTAIGLITAFSETFTELFPKSNYLWLATGVSILACIFANVGLTKIIMYS 
5 TPVLMFIYPLAITLILLALLSPLFKHSKIVYRFTTLFTMVAAFVDGVKASPEFFVNTKFA 
QTIIGFGENYLPFFNIGMGWIVPALIGFIIGIIVYFMTAKKSSHVQ* 

Sequence 403 

C on t i g_0 4 7 5_po s_5 84 7 5449, 

10 putative peptide of unknown function 

atgccacacctaggtacaaatgctgttgatattttagttgattttgtaaatgaaatgaaa 
caagaatataaaaatattaaagaacatgataaagtacacgagttagacgctgttccaatg 
attgagaaacatctccacagaaaaattggtgaagaagaatcacatatctactctggattt 
gtaatgttaaactctgtattcaatggtggtaaacaagttaattctgttcctcataaagcg 

15 acagctaaatataatgtaagaactgttccagaatatgacagtactttcgtgaaggattta 
tttgaaaaagtcattcgtcatgtgggcgaagattatttaactgtagatatacctagcagt 
cacgatccagtggcaagtgatcgttggagatttaattaa 

Sequence 404 

20 MPHLGTNAVDILVDFVNEMKQEYKNIKEHDKVHELDAVPMIEKHLHRKIGEEESHIYSGF 
VMLNSVFNGGKQVNSVPHKATAKYNVRTVPEYDSTFVKDLFEKVIRHVGEDYLTVDIPSS 
HDPVASDRWRFN* 

Sequence 405 
25 Contig_047S_pos_4 351_287 6, 

is similar to (with p-value 4.0e-90) 

>gp:gp|AF006665|AF006665_31 Bacillus subtilis 168 region at 
182 min containing the cge gene cluster. NID: g2529445. >gp: 
gp| AF015775[AF015775_7 Bacillus subtilis YodA (yodA) , YodB ( 

30 yodB) , YodC (yodC) , YodD (yodD) , ABC-transporter (yodE) , per 
mease (yodF), proteinase (ctpA) , YodH (yodH), YodI (yodl), c 
arboxypeptidase (yodJ) , purine nucleoside phosphorylase (deo 
D) , YodL (yodL), YodM (yodM) , YodN (yodN) , YodO (yodO) , YodP 
(yodP), acetylornitine deacetylase (argE), butirate-acetoac 

35 etate CoA transferase (yodR) , butyrate acetoacetate-CoA tran 
sferase (yodS) , YodT (yodT) r CgeE (cgeE) , CgeD (cgeD), CgeC 
(cgeC), CgeA (cgeA) , CgeB (cgeB) , YzxA (yzxA) , UDP-glucose e 
pimerase (yodU), YodV (yodV) , and YodW (yodW) genes, complet 
e cds; and YodZ (yodZ) gene, partial cds . NID: g2415383. >gp 

40 :gp| Z99114 |BSUB0011_121 Bacillus subtilis complete genome (s 
ection 11 of 21): from 2000171 to 2207900. NID: g2634230. 
atgaatgatcatcaaaaaaatcatgcaacatctcaagatgataacacaatgtcaacacca 
tctaagaatagcaagcatataaaaattaaattatggcatttcatactcgttattttgggt 
attattcttttaacatccatcattactgtagtatcaacaattttaattagccatcaaaaa 

45 agtggtttaaataaagaacaacgtgcaaatttaaaaaaaattgaatatgtctatcaaaca 
cttaataaagattattacaaaaagcaaagttctgataaattaactcaatctgccatagat 
ggtatggttaaagaacttaaagatccatattcagaatatatgactgctgaagaaacaaaa 
caatttaatgaaggtgtatcaggtgatttcgttggcataggtgctgaaatgcaaaagaaa 
aa tgaacagataagtgttactagcccaatgaaggattcaccagcagaaaaagctggtatt 

50 caacctaaagatatcgtcacacaagtgaatcatcattcggtagtcggtaaaccacttgat 
caagttgttaaaatggtccgcggcaaaaaaggaacatatgttactttaactataaaacgt 
ggttcgcaagaaaaggatattaagattaaacgcgataccattcacgttaagagtgtagag 
tatgagaagaaaggcaatgtaggcgtactaacaatcaataaattccaaagcaatacttct 
ggtgaactcaaatctgcaatcatcaaagcgcataagcaaggcatccgtcatatcatttta 

55 gatttgagaaataatccgggggggttattagatgaggcagtcaagatggctaacatcttt 
attgataagggaaatactgtcgttcaattagaaaaaggtaaggataaggaagaattaaaa 
acttctaatcaagcactaaaacaagcaaaagatatgaaagtatccatcttagttaatgag 
ggatcagctagtgcttcagaagtgtttacaggtgctatgaaagactatcataaagctaaa 
gtttacggttctaaaacatttggtaaaggtatcgttcagaccactcgtgaatttagtgat 
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ggttcattaattaaa tat acagagatgaaatggctaacgcctgatggccattatat teat 
ggtaaaggaattagaccagatgttagtatctcaacaccaaaataccaatcactcaatgtc 
attccagataacaaaacttatcatcaaggtgaaaaagataaaaatgttaaaacgatgaaa 
ataggtctaaaagctttaggttatccaattgataacgaaacaaacatatttgacgaacaa 
5 ttagaatctgctattaaaacatttcaacaagacaataat ttaaaagttaatggcaatttt 
gataaaaaaacaaatgataaatttactgaaaaactagttgaaaaagcgaataaaaaagat 
actgttttaaacgatttactaaacaaactaaaataa 

Sequence 406 

10 MNDHQKNHATSQDDNTMSTPSKNSKHIKIKLWHFILVILGI ILLTSIITVVSTILISHQK 
SGLNKEQRANLKKIEYVYQTLNKDYYKKQSSDKLTQSAI DGMVKELKDPYSEYMTAEETK 
QFNEGVSGDFVGIGAEMQKKNEQISVTSPMKDSPAEKAGIQPKDIVTQVNHHSWGKPLD 
QVVKMVRGKKGTYVTLTIKRGSQEKDIKIKRDTIHVKSVEYEKKGNVGVLTINKFQSNTS 
GELKSAIIKAHKQGIRHIILDLRNNPGGLLDEAVKMANIFIDKGNTVVQLEKGKDKEELK 

15 TSNQALKQAKDMKVSILVNEGSASASEVFTGAMKDYHKAKVYGSKTFGKGIVQTTREFSD 
GSLIKYTEMKWLTPDGHYIHGKGIRPDVSISTPKYQSLNVI PDNKTYHQGEKDKNVKTMK 
IGLKALGYPIDNETNIFDEQLESAIKTFQQDNNLKVNGNFDKKTNDKFTEKLVEKANKKD 
TVLNDLLNKLK* 

20 Sequence 407 

Contig_04 7 5_pos_2 5 4 0_20 37 , 

putative peptide of unknown function 

atgattagattagcaactaaagatgatttacttagtattactcaattagtcaaagaggct 
aaacagattatggaagaattcaacaacaaccaatgggatgatgaatatcccgcgaaagag 

25 cattttgaagaagacatcgaaaataaaacactatatgttttagacgttgatcatacaatt 
tatggttttattgttatcgaccaaaatcaatcggagtggtatgatgacattgattggcct 
gt taatcgaaatggggcatacgt tattcacagattagctggatcaaaacaatataaaggt 
gctgcgactgaacttttccaatttgccat tgacttagcaaatgaacatgatattcatgtc 
attttaacagatacatttgccctcaataaacctgctcaaggattat ttgaaaagtttggt 

30 tttactaaagttgatgagatagagatagattatcatccttttgatagaggggcacctttt 
tatgeatattataaaaacatataa 

Sequence 408 

MIRLATKDDLLSITQLVKEAKQIMEEFNNNQWDDEYPAKEHFEEDIENKTLYVLDVDHTI 
35 YGFIVIDQNQSEWYDDI DWPVNRNGAYVIHRLAGSKQYKGAATELFQFAIDLANEHDIHV 
ILTDTFALNKPAQGLFEKFGFTKVDEIEI DYHPFDRGAPFYAYYKNI * 

Sequence 409 

Contig_04 7 5_pos_2019_94 6, 

40 is similar to (with p-value 1.0e-63) 

>gp:gp| AF068902 I AF068902_4 Streptococcus pneumoniae D-glutam 
ic acid adding enzyme MurD (murD) , undecaprenyl-PP-MurNAc-pe 
ntapeptide-UDPGlcNAc GlcNAc transferase (murG) , cell divisio 
n protein DivIB (divIB) , orotidine-5 1 -decarboxylase PyrF {py 

45 rF) , and orotate phosphoribosyltransf erase PyrE (pyrE) genes 
, complete cds; and unknown genes. NID: g4009477. 
atgacaaaaattgcatatacaggtggaggaacagtaggacacgtttcagtgaatttaagt 
ttaattcctacttcgattgaaaaaggacacgaagcattttatattggttcaaaacatggt 
attgaaagggaaatgatagagtcacaactccctgatattcaatattatccaatatcaagc 

50 ggtaaattacgtcgttatctatcttttgaaaatgcaaaagatgtctttaaagttttgaaa 
ggaattttagatgcacgtaaaatacttaaaaaacaaaaaccagacttacttttttcaaaa 
ggtggttttgttagtgttccggtagttatagccgcacgttctttaaaaattccaactatc 
atacacgaatcagatttaactcctggattagctaataaaatttctttaaaatttgctaag 
aaaatatacacaacctttgaagatacacttacatatcttccaaaagataaagctgatttt 

55 gttggggctactgtacgtgaggacttaaaacaagggaataaagaaagaggatatcaactc 
actgattttgataaaaataaaaaagtgttattagtcatgggaggaagtttaggtagtaaa 
aaacttaataatatcattcgtcaaaatattgaggcacttctccacgattatcaaattata 
cacttaactggaaaaggacttgttgatgactcaatcaataaaaaaggttatgttcaattt 
gaatttgttaaagacgacttaactgatttattagcaatcactgatactgttgtaagtcgt 
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gcaggttctaacgcaatttatgaatttttaacgctacgtataccgatgttactcatcccc 
ttaggacttgatcaatcaagaggagatcaaattgataatgctaaaaactttgaatctaag 
ggttatggtcgtcatattcctgaagatcaacttacagaagttaacttattgcaagaatta 
aatgatattgaattacatcgtgaatctattattaaacaaatggaaacatatcaagagagt 
5 tacacgaaagaagatttatttgataaaattattcatgatgcattaaacaagtag 

Sequence 410 

MTKIAYTGGGTVGHVSVNLSLIPTSIEKGHEAFYIGSKHGIEREMIESQLPDIQYYPISS 
GKLRRYLSFENAKDVFKVLKGILDARKILKKQKPDLLFSKGGFVSVPVVIAARSLKIPTI 
10 IHESDLTPGLANKISLKFAKKIYTTFEDTLTYLPKDKADFVGATVREDLKQGNKERGYQL 
TDFDKNKKVLLVMGGSLGSKKLNNIIRQNIEALLHDYQIIHLTGKGLVDDSINKKGYVQF 
EFVKDDLTDLLAITDTVVSRAGSNAI YEFLTLRIPMLLIPLGLDQSRGDQIDNAKNFESK 
GYGRHIPEDQLTEVNLLQELNDIELHRESIIKQMETYQESYTKEDLFDKIIHDALNK* 

15 Sequence 411 

Contig_04 75_pos_933_319, 

is similar to (with p-value 3.0e-21) 

>pir:pir | S32217 | S32217 hypothetical protein 2 - Bacillus meg 
aterium >gp: gp | Z21972 | BMCTP450A_3 B.megaterium cytochrome P4 

20 50meg, ORF1 and ORF2 genes. NID: g288298. 

atgaatcgatggaaacgcatttcattgcttattgtttttacacttatttttggtataata 
gctttttttcatgaatcaaggcttggaaaatggatagataacgaagtatatgaatttatt 
tat teat ctgaaagttt cat taccacatctattatgttaggtgtaacaaaaattggtgaa 
gtttgggcaatggttgcgctatccttattattagttgcttaccttatgctaaaacgcttc 

25 aagattgagacattattctttgtaatagtaatgagcttatctagtacactcaatccacta 
ttaaagaatatctttgatagggaacgtccaacattattgcgtttaattgacatttcaggc 
tttagtt ttccaagcggtcatgctatgggctcaacttcattctttggaagcgctatatat 
gtaataaaccg teat gat tcgggtatctctaaaggcgtgt taategg t ttatgegcact t 
ttcattttattaatatcaacttctagagtgtatctaggcgttcattaccctacagatatt 

30 attgccggcattattggtggtgtattctgccttttactcagtactttattactacctaaa 
cagttaatagcttag 



35 Sequence 412 

MNRWKRISLLIVFTLIFGI IAFFHESRLGKWIDNEVYEFI YSSESFITTSIMLGVTKIGE 
VWAMVALSLLLVAYLMLKRFKIETLFFVIVMSLSSTLNPLLKNIFDRERPTLLRLIDISG 
FSFPSGHAMGSTSFFGSAI YVINRHDSGISKGVLIGLCALFILLISTSRVYLGVHYPTDI 
IAGI IGGVFCLLLSTLLLPKQLIA* 

40 

Sequence 413 

Con t ig_0 4 7 6_pos_l 6 1 9 2608 , 

is similar to (with p-value 2.0e-56) 

>sp:sp|P54 94 8| YXEI_BACSU HYPOTHETICAL 37.2 KD PROTEIN IN IDH 
45 -DEOR INTERGENIC REGION . >gp : gp | Z99124 j BSUB0021_58 Bacillus 
subtilis complete genome (section 21 of 21) : from 3999281 to 
4214814. NID: g2636442. >gp:gp! D45912 I D45912_12 Bacillus su 
btilis genome sequence between the iol and hut operon, parti 
al and complete cds . NID: gl408482. 
50 gtgttatatatgtgtactgccatttctttatatacaaaacaacgttaccattatttagct 
agaacaatggactttgcatttgaatttaatggtatcccaaccattgttccacgccattat 
cactaccaatttgatctagattcagacatgcgtcttgaatatggttttgttggaacaaat 
ttaaaagtaggacgttatagatttggtgatggtataaacgaaaaaggtttagctatttcg 
aaccattacttcactggtgaagcctcatacagtacccataaacgttatggttattttaac 
55 ttagcacctgaggagtttattgtttgggttttaggttttaataaaagtattagcgaatta 
aaacaaaaggttaagaaaatcaatattatgaatgaaaaaaatacgactttgaatatcgtt 
cc tcctttacatttcatggtcactgatgaaacaggacataccgtagccatagaacctcac 
aatggcttattaatagttaaagataattatgttcataccttaacaaatgaacctaaatta 
gattggcatctatctaacttaagaaattacgcttttttaacgccacagaaatcaaccaat 
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caattaataggtaaagtgctagtaagatcaatgggctgtgaagcaggaacaaatggctta 
ccgggtggttatacgtcaacagatcgttttatacgcgctacatatttaagacaccaacta 
cgctgttcccataatgaagatgaaaatttaatgaattgttttaaagttctagaatcagtc 
agtatccctcaaggtgcagttatcgatgccaataaaatacattacacacaatatcaatta 
5 gtgatggaaagtaaagaaagaagttattatattaagccttactttagcaatcaaattttc 
aaaataaaattaactgaagaccttttaagtaagaatgagatgacattcttacctattaat 
cacgaattaaagataacatcaatacaatag 

Sequence 414 

10 VLYMCTAISLYTKQRYHYLARTMDFAFEFNGIPTIVPRHYHYQFDLDSDMRLEYGFVGTN 
LKVGRYRFGDGINEKGLAISNHYFTGEASYSTHKRYGYFN1APEEFIVWVLGFNKSISEL 
KQKVKKINIMNEKNTTLNIVPPLHFMVTDETGHTVAIEPHNGLLIVKDNYVHTLTNEPKL 
DWHLSNLRNYAFLTPQKSTNQLIGKVLVRSMGCEAGTNGLPGGYTSTDRFIRATYLRHQL 
RCSHNEDENLMNCFKVLESVSI PQGAVI DANKI H YTQYQLVMESKERS YYIKPYFSNQIF 

15 KIKLTEDLLSKNEMTFLPINHELKITSIQ+ 

Sequence 415 

Con t i g_0 4 7 6_pos_2 8 4 1_3 5 4 2 , 

is similar to (with p-value 3.0e-41) 

20 >sp:sp|P39610|THID_BACSU PHOSPHOMETHYLPYRIMIDINE KINASE (EC 
2.7.4.7) (HMP- PHOSPHATE KINASE) (HMP-P KINASE). >pir:pir|S39 
707IS39707 hypothetical protein - Bacillus subtilis >gp:gp|X 
73124 |BSGENR_53 B. subtilis genomic region (325 to 333). NID: 
g413923. >gp : gp I Z99123 | BSUB0020 97 Bacillus subtilis comple 

25 te genome (section 20 of 21): from 3798401 to 4010550. NID: 
g2636240. 

atggataaagaaacatggtcccatgatgtaacacctattgatatgaatgttttcgaaaaa 
caacttgaaactgcaatatcaattggacctgatgctattaaaacaggaatgttagggaca 
caagacattattaaacgtgccggagatgtttttgttgaatctggtgcagactattttgta 

30 gttgatccagtaatggtttgtaaaggagaagacgaagtacttaacccaggaaacacagaa 
gcaatgattcaatatttactacctaaagctacagttgttaccccgaatttat tcgaagca 
ggtcaactctctggtttaggaaaattaacatcaattgaggatatgaaaaaagctgctcaa 
gtgatttatgacaaaggcacacctcatgtcattattaaaggtggtaaagcactcgatcaa 
gataaatcttatgacttgtactatgatggccaacaattttatcaattaactactgacatg 

35 ttccaacaaagttataatcatggtgcaggatgcacatttgctgctgccacaacagcttat 
cttgcgaacggtaaatctccaaaagaagcaatcattgctgctaaagcatttgtagct tea 
gcaatcaaaaatggttggaaaatgaatgactttgtaggacctgttgatcatggtgcatat 
aaccgtattgaacagattaacgttgaagtcactgaggtttaa 

40 Sequence 416 

MDKETWSH DVT PI DMNVFEKQLETAI SI GPDAI KTGMLGTQDI I KRAGDVFVESGADYFV 
VDPVMVCKGEDEVLNPGNTEAMIQYLLPKATVVTPNLFEAGQLSGLGKLTSIEDMKKAAQ 
VI YDKGTPHVI IKGGKALDQDKS YDLYYDGQQFYQLTTDMFQQSYNHGAGCTFAAATTAY 
LANGKS PKEAI I AAKAFVAS AI KNGWKMNDFVG PVDHGA YNRI EQI NVEVTEV* 

45 

Sequence 417 

Contig_0476_pos_10083_11204, 

is similar to (with p-value 0.0e+00) 

>Sp:sp| P054 25I ATKB_ENTFA POTASSIUM/COPPER-TRANSPORTING AT PAS 
50 E B (EC 3.6.1.36). >pir : pir | B4 5995 I B4 5995 Cu2+-t ransport ing 
ATPase (EC 3.6.1.-) - Enterococcus hirae >gp : gp I L13292 | ENECO 
PPUMP_2 Enterococcus hirae ATPase (copA) gene, complete cds; 

ATPase (copB) gene, complete cds. NID: g290641. 
atgctcattcaaaatgatgttgattttgcattagaacgtcttqtaactgtgttagtcatt 
55 gcttgtccacatgctttaggcttggcaatacctttagtcactgcacgttctacttcaatt 
ggtgcacataatggtttaattattaaaaatagagagtctgtagaaatagctcaacatatc 
gattatgtaatgatggataaaactggtactttaactgagggtaacttttctgtgaatcat 
tatgagagctttaaaaatgatttgagtaatgatacaatattaagccttttcgcctcatta 
gaaagtcaatctaatcacccat tagctataagtattgttgatt ttgcgaaaagtaaaaat 
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gtttcatttactaatccacaagacgttaataatattccaggtgtcggattagaaggtcta 
attgataataaaacatataaaataacaaatgtctcttatcttgataaacataaacttaat 
tatgacgatgacttgtttactaaattagctcaacaaggtaattcaatcagttatttaatt 
gaggatcaacaagtcattggcatgattgctcaaggagatcaaattaaagaaagctcaaaa 
5 caaatgatagctgatttactatcaagaaatattacaccagtcatgcttacaggtgacaat 
aatgaagtggcacacgctgtcgcaaaagaattaggtattagtgatgttcacgcacaactc 
atgccagaagataaggaaagcattataaaagattatcaaagtgacggtaataaagtcatg 
atggtcggagacggtatcaacgatgcgccgagtcttataagagccgatattggtatagca 
attggtgcaggcacagatgttgcagtggattcaggtgatatcatacttgttaaaagtaat 
10 ccatcagatatcattcatttcttgactctttcaaataatactatgagaaaaatggtgcaa 
aacttatggtggggtgcaggttataatattgttgctgtacctttagcagctggcgcatta 
gcttttatcgggttaatattatcaccagctgtaggagcaatattaatgtctttaagtaca 
gttatagtagcgattaatgcttttacattaaaattaaaataa 

15 Sequence 418 

MLIQNDVDFALERLVTVLVIACPHALGLAI PLVTARSTSIGAHNGLI IKNRESVEIAQHI 
DYVMMDKTGTLTEGNFSVNHYESFKNDLSNDTILSLFASLESQSNHPLAISIVDFAKSKN 
VSFTNPQDVNNIPGVGLEGLIDNKTYKITNVSYLDKHKLNYDDDLFTKLAQQGNSISYLI 
EDQQVIGMIAQGDQIKESSKQMIADLLSRNITPVMLTGDNNEVAHAVAKELGISDVHAQL 

20 MPEDKESIIKDYQSDGNKVMMVGDGINDAPSLIRADIGIAIGAGTDVAVDSGDIILVKSN 
PSDIIHFLTLSNNTMRKMVQNLWWGAGYNIVAVPLAAGALAFIGLILSPAVGAILMSLST 
VIVAINAFTLKLK* 

Sequence 419 
25 Contig_04 7 6_pos_984 1_7 54 1 , 

is similar to (with p-value 0.0e+00) 

>gp : gp | AF007865 | AF007 8 65_3 Bacillus lichenif ormis bacitracin 
synthetase operon including bacitracin synthetase 1 (bacA) , 
2 (bacB) and 3 (bacC) genes, complete cds. NID: g2982193. 

30 atgatacctgtgcattttatgaaggtggatcgtatacctatcacgatgaatgggaaatta 
gatgtgcgtgcattacctgaaattaatctaaagaataatagaaattatgtagaaccacgt 
aacgatattgaacgcacagtttgccgtattttcgaagagattttacatgttgatcaggta 
ggtgttaaagataatttctttgaactaggtggacactctcttagagcaacattagttgta 
aaccgtattgaagaaaggttaaaaaaacgtcttaaagtaggtgatttaatgaaatcgcct 

35 actgtagagcaacttggacaacaaattgaagaactgcaaaatgatgtctatgaagtgat t 
cccaaagcaaatgaatcgtatcaatatgatttaagtgcgtctcaaaaaagtatgtatctt 
ttatggaaggtcaatcctaaagacacagtgtataacattccattcttatggagattatct 
tctgaacttaatgttatgcaattgcaacgtgcattatctaagttgattgaacgtcatgaa 
atattacgaacacaatatgtaattgatgacaatgaagttaaacaacgtattgcgacacat 

40 gtttcgcctgattttgaagaggtaacgacatctctaacgaacgagcaagatattattcaa 
tcatttatggaaccgtttgatttagaacaaccaagtcagatgcgagttaaatatatacat 
ggaccacaacaagattatttatttatggatactcatcatagtattaatgatggtatgagt 
aacacgattttactatctgatttgaacgctttataccaagataaatcattacctgaactt 
aagcttcagtataaagattatagtgagtggatggtgcacagagacttatctaaacaacgt 

45 cacttttggttacagcaatttgaaaatcaggttccaatattaaatatgcctacggattat 
cctagaccaagtattaaaacaaccaacggtaatatgttgacgtttcattacaatcgtcaa 
atcaaacagcaattgaaatcttatgtagaacaacatcaagtgacagactttatgttcttt 
gctagtgcaatcatggtattattgcacaaatatacacgtcaggacgatatcgctattggt 
agtgtaatcagtgcgcgtactcatcgcgatactgaaaatatgttaggtatgtttgctaat 

50 acacttgtatatcgtggtcgaccacatgatcaaaagacatgggatcaattgatggctgag 
atgaaagagatgtgtctaggggcatatgaacatcaagaatatccttttgaaagcttagtc 
aatgatcttgttgatgaaagagatgcttcacataatccgttatttgatgtgatgctcgta 
cttcaaaataatgaaacaaatcatgcgaattttggacatagtcaattgacacatattcca 
cctcagtcaacaacagctaaatttgatttgtcatttattattgaagaagatcaagatgac 

55 tatgtcgtcaatattgaatataatacagatttatataaacaagagaccattcatcatatt 
gctgaacaacttcaaatgattattaaacatgtaatatctaccgaaaacctaaaaattcaa 
gatattgatgaaaatgatgacttattaatttggttggacaagcatgtgaatgattgttct 
ttagacttgccaaaaaataagtcaatacagcaacttttacatgatgtcatgaaagcgaaa 
gcagatgatgtagcacttaaaatgaatggacaatcgatgacgtatcaagaacttgatgat 
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tattctaatagtatggctcaaacattgatacaaaatggcattcaaaaaggggaacgtgta 
gcccttttaactgaacgaagttttgaaatggttgctagtatgattgctgtattaaaagtt 
ggaggttcttatgtacctattgacgtcacttatcccgataaacgcattgaatttattatt 
gaagacgctgaagtcgcagcagtgctcacatatggaaaagcaatatcctcacatatacca 
5 gtaattaaaattgaagatattgataacactgaaaataataaaaggttaaatatagaatat 
gcagggaatttggaagatgatatgtatcatatttatacatctggaacaacaggaaagcct 
aaagcagtatcagtgaaacaacgtaatatattaaatttagtatgtgcttggacaaaaaga 
ctcaatttatccgatgatgaagtctatctgcagtacgctaattatgtgttcgatgcttcg 
gcaactgatttctactgttag 

10 

Sequence 420 

MIPVHFMKVDRI PITMNGKLDVRALPEINLKNNRNYVEPRNDIERTVCRIFEEILHVDQV 
GVKDNFFELGGHSLRATLVVNRIEERLKKRLKVGDLMKSPTVEQLGQQIEELQNDVYEVI 
PKANESYQYDLSASQKSMYLLWKVNPKDTVYNIPFLWRLSSELNVMQLQRALSKLIERHE 

15 ILRTQYVIDDNEVKQRIATHVSPDFEEVTTSLTNEQDIIQSFMEPFDLEQPSQMRVKYIH 
GPQQDYLFMDTHHSINDGMSNTILLSDLNALYQDKSLPELKLQYKDYSEWMVHRDLSKQR 
HFWLQQFENQVPILNMPTDYPRPSIKTTNGNMLTFHYNRQIKQQLKSYVEQHQVTDFMFF 
ASAIMVLLHKYTRQDDIAIGSVISARTHRDTENMLGMFANTLVYRGRPHDQKTWDQLMAE 
MKEMCLGAYEHQEYPFESLVNDLVDERDASHNPLFDVMLVLQNNETNHANFGHSQLTHIP 

20 PQSTTAKFDLSFIIEEDQDDYVVNIEYNTDLYKQETIHHIAEQLQMIIKHVISTENLKIQ 
DIDENDDLLIWLDKHVNDCSLDLPKNKSIQQLLHDVMKAKADDVALKMNGQSMTYQELDD 
YSNSMAQTLIQNGIQKGERVALLTERSFEMVASMIAVLKVGGSYVPIDVTYPDKRIEFII 
EDAEVAAVLT YGKAI SSH I PVI KI EDI DNTENNKRLNI EYAGNLEDDMYHI YTSGTTGKP 
KAVSVKQRNILNLVCAWTKRLNLSDDEVYLQYANYVFDASATDFYC* 

25 



Sequence 421 

Contig_0476_pos_7178_6864, 

30 putative peptide of unknown function 

gtgcaacaaaatttagggaaaatggaaagtttattactttctgcaagacattttctatgg 
agtacagctagagggtatcaatcatatacagaggatgcacaaatatggaatgaaacctca 
gcaagtaaagtggtggtaatgaaccaaggtatagaaatcgttgatttagctatgagaata 
gttggagctaagagtctagaaatgagcagacctcttcaacggtacta tagagatatacgt 

35 gctggattacataatccaccaatggaagatatggcttacactaatattgctaaaagtatt 
acaaacaaactttaa 

Sequence 422 

VQQNLGKMESLLLSARHFLWSTARGYQSYTEDAQIWNETSASKWVMNQGIEIVDLAMRI 
40 VGAKSLEMSRPLQRYYRDIRAGLHNPPMEDMAYTNIAKSITNKL* 

Sequence 423 

Contig_04 7 6_pos_2968_2609, 

is similar to (with p-value 2.0e-20) 

45 >sp:sp|P39610|THID_BACSU PHOSPHOMETHYLPYRIMIDINE KINASE (EC 
2.7.4.7} (HMP- PHOSPHATE KINASE) (HMP-P KINASE) . >pir:pir|S39 
707JS39707 hypothetical protein - Bacillus subtilis >gp:gp[X 
73124 |BSGENR_53 B. subtilis genomic region (325 to 333). NID: 
g413923. >gp : gp I Z99123 I BSUB0020_97 Bacillus subtilis comple 

50 te genome (section 20 of 21): from 3798401 to 4010550. NID: 
g2636240. 

atgtcttgtgtccctaacattcctgttttaatagcatcaggtccaattgatattgcagtt 
tcaagttgtttttcgaaaacattcatatcaataggtgttacatcatgggaccatgtttct 
ttatccattgttacaatagatgttaaagcgaccattccatatacatcaagttcttggaac 
55 gttttaagatctgcttgcataccggcaccagcacttgtatctgaaccagctatcgttaat 
actttttttaaagccatcat tea ttcactcccattaatttctagtgtcttta teat a tea 
tgtttatcgcgtacgctaaattattataattttaaaatgcaaatcaatcatca tact tag 
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Sequence 424 

MSCVPNIPVLIASGPIDIAVSSCFSKTFISIGVTSWDHVSLSIVTIDVKATIPYTSSSWN 
VLRSACI PAPALVSEPAIVNTFFKAIIHSLPLISSVFIISCLSRTLNYYNFKMQINHHT* 

5 

Sequence 4 25 

Cont ig_04 7 6_pos_l 4 6 1_8 1 1 , 

is similar to (with p-value 3.0e-64) 

>pir:pir IS39712 IS39712 hypothetical protein - Bacillus subti 
10 lis 

atgaaatggtcagaggtatttcatgatataacaacgcgccatgattttcaggcgatgcat 
gactttttagaaaaagaatatacgactcaaaccgtctatccagatatacaaaatatctat 
caagcat ttgatttaacgccgtttgaagatatcaaggttgttattttagggcaagatcct 
tatcacggtcctaatcaagcacatggtttagcattttcagtgcaacctcatgctaaattt 

15 ccaccatctttaagaaatatgtatcaagaactagaaaatgatatagggtgtcatagaact 
tcgcctcatttacaagactgggcaagagaaggtgtcttgttattaaatacggtattgact 
gttcgacaaggtgaagcacattcacatcgaaatattggatgggaaacattcacggatgaa 
atcatacaagctgtttctaattatcgtgagcatgttgtttttattctgtggggaagaccg 
gctcaacaaaaggaacgattcattgatacatctaaacacttaatcattaaatcgccacat 

20 cctagtccactatcggcttttagaggattttttggttctaaaccttattcaactacaaat 
aactatttaaaatctaaagggaaaacaccagttcagtggtgtgaaagttag 

Sequence 426 

MKV7SEVFHDITTRHDFQAMHDFLEKEYTTQTVYPDIQNI YQAFDLTPFEDIKVVILGQDP 
25 YHGPNQAHGLAFSVQPHAKFPPSLRNMYQELENDIGCHRTSPHLQDWAREGVLLLNTVLT 
VRQGEAHSHRNIGWETFTDEI IQAVSN YREHVVFILWGRPAQQKERFI DTSKHLI I KSPH 
PSPLSAFRGFFGSKPYSTTNNYLKSKGKTPVQWCES* 

Sequence 427 
30 Con t i g_0 4 7 6_po s_8 1 0_4 4 2 , 

putative peptide of unknown function 

gtgagaataatgaacaaagaacagattctacaattgattgagcaagaattgatacaagca 
gatgaagctcagacagatacggaatttgaaaagcatatgtatgctatacacatgctcaca 
tctcttgttagttctcatcaaagtcgttctacaatagagaaattaaatcattctaaacca 
35 atgaatagtaatatcaaagatgattatgagatgaaacaacagtcttcacaaaaacatcat 
gtaactgcagctgaaatagaagcaatgggtggtaaagtaccacaatcaatgaaaaagcat 
catacttctaataatatgatgattacagatgatcaagttggtaatggtgaatctattttt 
gatttttaa 

40 Sequence 428 

VRIMNKEQILQLIEQELIQADEAQTDTEFEKHMYAIHMLTSLVSSHQSRSTIEKLNHSKP 
MNSNIKDDYEMKQQSSQKHHVTAAEIEAMGGKVPQSMKKHHTSNNMMITDDQVGNGESIF 
DF* 

45 Sequence 4 29 

Cont i g_0 4 7 6_pos_4 2 0_4 9 , 

is similar to (with p-value 3.0e-27) 

>sp:sp| P39619I YWDK_BACSU HYPOTHETICAL 12.0 KD PROTEIN IN UNG 
-ROCA INTERGENIC REGION . >pir : pir I S397 16 I S397 1 6 hypothetical 

50 protein - Bacillus subtilis >gp : gp I X73124 | BSGENR_62 B. subti 
lis genomic region (325 to 333). NID: g413923. >gp : gp | Z99123 
|BSUB0020_88 Bacillus subtilis complete genome (section 20 o 
f 21): from 3798401 to 4010550. NID: g2636240. 
atgatgaaagtttttattattttaggtgcattaaatgcaatgatggctgtcggtactggc 

55 gcatttggagcacatgggttggaagataaattatcagataaatacatgtcaatatgggaa 
aaagcaacaacttatcaaatgtatcatggattaggtctgttagttataggtttaataagt 
ggtacaacatcaattaatgtaaattgggctggttggttattattctttggtattgtcttt 
ttcagtggttccttgtatttcttagccttaacacaagttcgtattttaggtgcaattacg 
ccaataggtggtgttctatttataattggttggcttgttcttgtgattgctacacttaaa 
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ttcgctgggtaa 
Sequence 4 30 

MMKVFIILGALNAMMAVGTGAFGAHGLEDKLSDKYMSIWEKATTYQMYHGLGLLVIGLIS 
5 GTTSINVNWAGWLLFFGIVFFSGSLYFLALTQVRILGAITPIGGVLFIIGWLVLVIATLK 
FAG* 

Sequence 4 31 

Contig_04 77jpos_802_1206, 
10 is similar to (with p-value 8.0e-16) 

>sp:sp|P4 9856| YKKC_BACSU HYPOTHETICAL 11.9 KD PROTEIN IN HMP 
3 , REGION. >gp:gp|D78189|BAC168TRP2_6 Bacillus subtilis hmp 

DNA for 7 ORFs, complete cds. NID: gl063245. >gp:gp I AJ002571 

|BSAJ2571_28 Bacillus subtilis 168 56 kb DNA fragment betwee 
15 n xlyA and ykoR. NID: g2632001. >gp: gp | Z991 10 | BSUB0007_1 91 B 

acillus subtilis complete genome (section 7 of 21): from 119 

4391 to 1411140. NID: g2633472. 

atgtttgtgagaagccattgcctcgaagttacgaatttgagaaaatggttctattacttt 
aataggaggaataaaagaatgcaatggcttaaagttatattagccggttttattgaaatc 
20 atctgggtcactggacttgatcaagcgcactcattgtttacatggatatttaccctcttt 
tttattgctttaagcttttttctagtcattgatgcttcgaagcacttaccagttggtacg 
gtatatgcattttttgtcggaatcggtgctgttggtacagtgttagttgatatgat tttc 
ttcaaccaaccatttactttcactaaaatctttttaataatgacccttattttaggaata 
ataggattaaaactgacaactgatgcaacgaaagaagggagataa 

25 

Sequence 432 

MFVRSHCLEVTNLRKWFYYFNRRNKRMQWLKVILAGFIEI IWVTGLDQAHSLFTWIFTLF 
FIALSFFLVIDASKHLPVGTVYAFFVGIGAVGTVLVDMIFFNQPFTFTKIFLIMTLILGI 
IGLKLTTDATKEGR* 

30 

Sequence 4 33 

Contig_04 77_pos_1212_1529, 

putative peptide of unknown function 

atggcttggttatttctaatgatagccggaagttttgaaattttgggcgttgttctatta 
35 aatgaactatcacgtacaaagaataaaatttatgtcat ttttttagga ttagcatttata 
ttaagttttagtacattaaaatttgcaatggtatctattcctatgggtactgcatacgct 
atatggacaggaattggtacagctggtggtacattaattggaatgattttttatagagaa 
tctacacgtttaagtagaattttatgtattttattaatcatcatttcagttgttggatta 
cgtttaataagttattaa 

40 

Sequence 4 34 

MAWLFLMIAGSFEILGVVLLNELSRTKNKIYVIFLGLAFILSFSTLKFAMVSI PMGTAYA 
IWTGIGTAGGTLIGMIFYRESTRLSRILCILLI IISVVGLRLISY* 

45 Sequence 4 35 

Con t i g_0 4 7 7_pos_l 6 4 3_2 5 60 , 

putative peptide of unknown function 

gtgggacttaacttattgaaagaacattttgaggtagacatgtatgatggcgaggggctt 
attgataaagaaaccttaaaaaaaggggtagaacatgcagatgcattaattagtttacta 

50 tcaacttctgttgataaagatattattgatagtgctaataaccttaaaattatagcgaat 
tatggtgcaggttttaataatattgatgtcgaatatgcaagacaacaaaatatagatgtt 
acaaatacaccacacgcttcgacaaatgctactgctgatttaacaatcggtttaatttta 
tcagtagcgcgtagaattgtagaaggagatcatttatccagaacaacaggttttgatggt 
tgggcacccttattcttccgaggcagagaggtatcaggaaaaactattggtattataggc 

55 ttaggtgaaattggaggtgcagtagctaaacgcgcacgcgcatttgatatggatgttctg 
tacactggtcctcatcgtaaggaagaaaaagaacgagatatcggtgcgaaatatgtagat 
ttagatactttacttaaaaatgcagattttattacaatcaatgcggcatataatccatca 
ctgcatcatatgattgatactgaacaatttaataaaatgaaatctactgcctatttaatt 
aatgcaggacgtggtccaatagtaaatgaacaatctttagttgaagcccttgataataaa 
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gctattgaaggtgctgcattggatgtatatgaatttgagccagaaatcactgatgcatta 
aa a teat ttaaaaacgttgtgcttacacctcacattggtaatgcaacatttgaagctaga 
gatatgatggctaaaattgttgcgaatgatacaataaaaaaattaaatggtgatgaacct 
cagtttattgtcaattaa 

5 

Sequence 4 36 

VGLNLLKEHFEVDMYDGEGLIDKETLKKGVEHADALISLLSTSVDKDI IDSANNLKI IAN 
YGAG FNNIDVEYARQQNIDVTNTPHASTNATADLTIGLILSVARRIVEGDHLSRTTGFDG 
WAPLFFRGREVSGKTIGI IGLGEIGGAVAKRARAFDMDVLYTGPHRKEEKERDIGAKYVD 
10 LDTLLKNADFITINAAYNPSLHHMIDTEQFNKMKSTAYLINAG RGPIVNEQSLVEALDNK 
AIEGAALDVYEFEPEITDALKSFKNVVLTPHIGNATFEARDMMAKIVANDTIKKLNGDEP 
QFIVN* 

Sequence 437 
15 Contig_0477_pos_6327_592 3, 

is similar to (with p-value 2.0e-16) 

>sp:sp|Q02115|LYTR_BACSU MEMBRANE- BOUND PROTEIN LYTR. >pir:p 
ir ! A47679 | A47679 lyt divergon expression attenuator LytR - B 
acillus subtilis >gp : gp| M87 64 5 | BACLYTABCD_1 Bacillus subtill 
20 is membrane bound protein (lytA and lytR) ; amidase enhancer 
(lytB); and amidase (lytC) genes, complete cds ? s . NID: gl431 
55. >gp:gp|Z99122|BSUB0019_62 Bacillus subtilis complete gen 
ome (section 19 of 21): from 3597091 to 3809700. NID: g26360 
29. 

25 atgggggctaatcactttgttaaaggtgaaaaaacacacgtagatggtgatgctgccatg 
gactttattagaagtcgtaaagaagatggggcaggaggcgattttggtagacaagagcgt 
cagcaacttatcttagaagcgatggcagataagatgacaagcgcttcttcaatcactcat 
tttaatacattaatgaatcaaattcagaaaaatgttaaaacagatttaaaattaggtgat 
cttaatacaattagaactaagtataaagatgctaatgaccaagttaatcgacaLcagtta 

30 gagggtgaaggtggtatacaaaatgacggtttgtactatttcataccaagtgatgcatct 
aaaaatgaaaatacacaattactaagagacaatttaaatttataa 

Sequence 438 

MGANHFVKGEKTHVDGDAAMDFIRSRKEDGAGGDFGRQERQQLILEAMADKMTSASSITH 
35 FNTLMNQIQKNVKTDLKLGDLNT.IRTKYKDANDQVNRHQLEGEGGIQNDGLYYFIPSDAS 
KNENTQLLRDNLNL* 

Sequence 439 

Contig_04 77_pos_4 992_4 216, 
40 is similar to (with p-value 3.0e-45) 

>gp:gp| U71377 | SEU71377_4 Staphylococcus epidermidis autolysi 
n AtlE and putative transcriptional regulator AtlR genes, co 
mplete cds. NID: g2267238. 

atgttaaagaatactcggcttcgaatgacgacattgttcataattagcatactcgtcatt 
45 ttagcaatactttttcttatatttgatacaaacctttttaaaaatgatgttaaacataca 
tttaaagaagcggtgtctcttcaaacaagtgagggaaatatccatactaaagaagttaat 
ggtaagtt tatatatgcttccaaacaagatatagagaaagctatgcaaataaaacatagt 
gataatgatttgaagtacatggatatatcagaaaaagtacctatgtcagagaaggaagtt 
aaccatatcttaaaaggaaaaggtattttagaaaataagggatcaacgtttattaaagcc 
50 caagataaatatgaagtgaatatcctatatctcatcagtcatgcactagttgagacagga 
aatggtcaatcagatttatcaaaaggaattaaagaaggtaaccatcactattataacttt 
tttggtat tggtgctt t tgatgaagatgctgtaaagactggtaagagtt ttgctaaacag 
aagaagtggaccactcctgaaaaagcgataatgggtggcgcgtggtttgtgagataccat 
tactttaaaaataatcaattgagcttatatcaaatgcggtggaacccacaaaatccaggc 
55 caacatcaatatgctagtgatattcagtgggccaataatatagctgatttaatggagaaa 
tactatgataaatatggaataaaaaaagatcatataagaaaaaaatattacaaataa 

Sequence 440 

MLKNTRLRMTTLFIISILVILAILFLIFDTNLFKNDVKHTFKEAVSLQTSEGNIHTKEVN 
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GKFIYASKQDIEKAMQIKHSDNDLKYMDISEKVPMSEKEVNHILKGKGILENKGSTFIKA 
QDKYEVNILYLISHALVETGNGQSDLSKGIKEGNHHYYNFFGIGAFDEDAVKTGKSPAKQ 
KKWTTPEKAIMGGAWFVRYHYFKNNQLSLYQMRWNPQNPGQHQYASDIQWANNIADLMEK 
YYDKYGIKKDHIRKKYYK* 

5 

Sequence 441 

Contig_04 77_pos_3832_2708, 

is similar to (with p-value 4.0e-24) 

>gp : gp | U29897 | PAU29897_1 Pseudomonas aeruginosa FAD binding 

10 protein homolog gene, partial cds . NID: g912581. 

atgaaaatagcaatagtaggcgcaggtataggtggtttaactgctgctgcgttattagaa 
gaacaaggtcatcaagttaaagtgtttgaaaaaaatacttctataaacgaattaagcgct 
ggtattgggataggagataatgttttaaaaaaattagggcatcatgaccttgctaaaggc 
attaaaaatgctggtcaaaatcttaccgcaatgaatatttatgatgagcaaggcacccca 

15 ttaatgagcgctaaattgaagtctcattccctaaatgtcgcattatctagacaaacttta 
attgagatcatacagtcatatgtcgaagaatcatctattcacacaggatttaaagttact 
aaaattgaacaaacgagttgtaaggttaccctacattttaccaaacaggaaagtgaatcg 
tttgatttgtgtattggtgctgatgggttacattctgtagtaagagagtctgtaggtgca 
cgaactaaaattcgttacaatggttacacatgttttagaggcatggttgaagatgtacaa 

20 tttaatgaccaacatgttgcgaatgaatattggggtgttaaaggacgagtaggtatagtc 
ccattaattaatcaacgtgcttattggtttattactgttcatgctaaagaaggagatcca 
aaatatcaatcttttggaaaaccccatcttcaagcatattttaatcactttccaaatgaa 
gtgagaaatgtgttagaaagacaaagtgaaacaggtatattacttcatgacatatatgat 
ttaaaaccactgaagacattcgtttatggacgtactattttaatgggcgatgctgcgcat 

25 gccactacgcctaatatgggacaaggtgctagtcaagctatggaagatgcaattgtatta 
gtgaattgtttagaaaaatatgattttaataaagcgattgagcgttatgataaacttaga 
gttaaacataccacaaaagtgattaggcgttcgaaaaagataggtaaaatggctcaaaag 
catcataaattaactgttaaacttagaaataccgcgatgaaattaataccaaatgctt tg 
gcatcagctcaaacaaaatttttatacaaatccaaagaaaagtaa 

30 

Sequence 4 42 

MKIAIVGAGIGGLTAAALLEEQGHQVKVFEKNTSINELSAGIGIGDNVLKKLGHHDLAKG 
IKNAGQNLTAMNIYDEQGTPLMSAKLKSHSLNVALSRQTLIEI IQS YVEESSI HTGFKVT 
KIEQTSCKVTLHFTKQESESFDLCIGADGLHSVVRESVGARTKIRYNGYTCFRGMVEDVQ 
35 FNDQHVANEYWGVKGRVGIVPLINQRAYWFITVHAKEGDPKYQSFGKPHLQAYFNHFPNE 
VRNVLERQSETGILLHDI YDLKPLKT FVYGRTILMGDAAHATTPNMGQGASQAMEDAI VL 
VNCLEKYDFNKAIERYDKLRVKHTTKVIRRSKKIGKMAQKHHKLTVKLRNTAMKLI PNAL 
ASAQTKFLYKSKEK* 

40 Sequence 4 43 

Contig_04 78_pos_5223_6236, 

is similar to (with p-value 5.0e-32) 

>gp:gp| AL034447 | SC7A1_23 St reptomyces coelicolor cosmid 7A1 . 
NID: g4007715. 

45 atgctctcaagagcaccatttggatttaaaggcaatcatatacctgctttaattggctgg 
gtaggtcaagttggttggttatctgttaatgtttctacaggaactttaactcttctggct 
ttattcaatacttttggttttaagactagtacatttctaattttgatgagtttagcgatt 
tttgctgggctagttattatatctgttcttttttcacaaaaagtacttgtatcagtacaa 
acatttttcacatatgtatttggtgcattaaccttattagttataacaattttaattact 

50 aatactgattggaacgcccttttttctatgaaatctgggtcttggcttaaaggttttcta 
cctgcattagcctttgtaatagtagggactggattgagttggactaacgcagctgcagat 
tatagccgttttcaaaaaaaatcgaacagttctttatcaataatcactagtgttacagct 
ggcgcgtttatccctttatttctcattataagtactggaattttattagctacttcagag 
ccacaattagcaaatgcagaaaatccaatattattaattagcgaagtactaccaaattgg 

55 atgacagtaatttacttaatatctgctttaggtggccttactcctatgtgttttttaggt 
ttaaagtcctcaagattaattatgagtacttttgatttgaaagtaaaaaattctacagtt 
attattattcattcaattattattattgccattcctatttatgtcttagtagtttccaga 
aattttctcgctttttttgaaatgtttttaggagttttgggtattggattagctgcttgg 
t ctgcaa t t tt ca tt g ttgatt at gcaacattgagaaaaaatataggct a tgaaa a a aaa 
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ttggtttgcgatccccagtataatagtctgaatattaaaacagtaatggtctggagtata 
gcagtaatagtaggtgcattaataaacatttttattcttcaagttttgatataa 

Sequence 4 44 

5 MLSRAPFGFKGNHIPALIGWVGQVGWLSVNVSTGTLTLLALFNTFGFKTSTFLILMSLAI 
FAGLVIISVLFSQKVLVSVQTFFTYVFGALTLLVITILITNTDWNALFSMKSGSWLKGFL 
PALAFVI VGTGLSWTNAAADYSRFQKKSNSSLSI ITSVTAGAFI PLFLI ISTGILLATSE 
PQLANAENPILLISEVLPNWMTVIYLISALGGLTPMCFLGLKSSRLIMSTFDLKVKNSTV 
IIIHSIIIIAIPIYVLVVSRNFLAFFEMFLGVLGIGLAAWSAIFIVDYATLRKNIGYEKK 
10 LVCDPQYNSLNIKTVMVWSIAVIVGALINI FILQVLI* 

Sequence 4 45 

Cont ig_04 7 8 jpos_6370_667 5 , 

putative peptide of unknown function 

15 gtgatcggcagtgctgttattgatgtgattttaaatgttaatagtataccaagtagtgga 
tcagacgaatttgcccactctgagaagacaatagtaggtggttgcgcatataatgtaggc 
gatatacttagtcagttcaaagctaattatgatttgatggtgcccgttggcgatggtctt 
aatggaacaattattgaaaataagttaaaaaaagaaggcaaaact teat tat taaataat 
atattaggtgataatggttggacgttatgtactgagagatcccctcataatttccccaaa 

20 gcgtaa 

Sequence 4 46 

VIGSAVIDVILNVNSIPSSGSDEFAHSEKTIVGGCAYNVGDILSQFKANYDLMVPVGDGL 
NGTI IENKLKKEGKTSLLNNILGDNGWTLCTERSPHNFPKA* 

25 

Sequence 447 

Contig_0478_pos_8029_8592, 

putative peptide of unknown function 

atgacttggaaaaagaattggtttaaagaaattgatcttaataagtatgattatatttat 
30 gtgtcaggttattcttttgaacctccttcagacgaagttttat tagaagaatttagtcgt 
ttaaacgagaaaactacaattatttttgacccctcaccaaggattaataaaatgaactgt 
gagagtataaggaagttgcttgaaataaacacaatagtacatgccaacgaaggtgaaata 
ttacaattgagtagtgagaatcatgtgaaagatgcggcattagaagtaagtaaacagact 
aatcaacctgtgatagttacattaggcaacaaaggtactcttatagcaaataagtgtaaa 
35 gttaagattttagagggggaaaaggttcctgtaactgatactataggcgctggtgattca 
cacacagcagcttttatagcaggtttgctagataaccaaagtattgaaaaagcttgtata 
tggggaaacgaagtagcatctaaaattgtgcaagaacgaggtggaaatacggatatattc 
aatcctatagataaagaatattaa 

40 Sequence 4 48 

MTWKKNWFKE I DLNKYDYI YVSGYS FEPPSDEVLLEEFSRLNEKTTI I FDPS PRINKMNC 
ESIRKLLEINTIVHANEGEILQLSSENHVKDAALEVSKQTNQPVIVTLGNKGTLIANKCK 
VKILEGEKVPVTDTIGAGDSHTAAFIAGLLDNQSIEKACIWGNEVASKIVQERGGNTDIF 
NPIDKEY* 

45 

Sequence 4 49 

Cont ig_0 4 7 8_pos_l 065 0_0 , 

is similar to (with p-value 1.0e-69) 

>sp:sp|Pl64 68|MAOX_BACST MALATE OXIDOREDUCTASE (NAD) (EC 1.1 
50 .1.38) (MALIC ENZYME). >pir : pir I A33307 I DEBSXS malate dehydro 
genase (oxaloacetate-decarboxylating) (EC 1.1.1.38) - Bacill 
us stearothermophilus >gp :gp I Ml 9485 | BACMAL_1 B. stearothermop 
hilus malic acid gene, complete cds . NID: gl43164. 
atgtctttaagagatgacgctttagaaatgcatagagagaaccaaggtaaactagaaatt 
55 acaccaaatgttaaagtgacaaataagcaacaattaagcctagcatactcacctggcgtt 
gcagaaccttgtaaagaaatccatgaagattcaagaaaagtatatgagtacactattaaa 
ggaaatacagttgctgttgtaacagatggaactgctgttctcggtttagggaatattgga 
gcagaagcaagtattccagtaatggaaggaaaggcagcactgttcaaaagttttgcgggt 
attaatggtgtgccaatagctctagatacaactgacactcaagaaatcataaaaacagta 
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aaacttattgcaccaaactatggtggaattaatcttgaagatatatcagctccccgctgt 
tttgaaattgaagaaaccttaaagaaagagaccaatatacctatttttcatgacgatcaa 
catggtacagctattgttactatggctgggttaatcaatgctttaaaaattgtagataaa 
gagttaacggatataaaagttgtattaaatggtgcaggtgcagcaggtatcgctatagtg 
5 aagttacttcatgcttatggtgtgaataatatgattattcacaccataagca 

Sequence 450 

MSLRDDALEMHRENQGKLEITPNVKVTNKQQLSLAYSPGVAEPCKEIHEDSRKVYEYTIK 
GNTVAVVTDGTAVLGLGNIGAEASIPVMEGKAALFKS FAGINGVPIALDTTDTQEI IKTV 
10 KLIAPNYGGINLEDISAPRCFEIEETLKKETNIPIFHDDQHGTAIVTMAGLINALKIVDK 
ELT DI KVVLNGAGAAG I AI VKLLH AYG VNNMI I HT I SX 



15 Sequence 4 51 

Contig_04 78_pos_10987_1064 6, 

is similar to (with p-value l.Oe-28) 

>gp: gp I U35659 | SBU35659_1 Streptococcus bovis malic enzyme ge 
ne, complete cds. NID: gl006838. 

20 gtgtcagttgtatctagagctattggcacaccattaatacccgcaaaacttttgaacagt 
gctgcctttccttccattactggaatacttgcttctgctccaatattccctaaaccgaga 
acagcagttccatctgttacaacagcaactgtatttcctttaatagtgtactcatatact 
tttcttgaatcttcatggatttctttacaaggttctgcaacgccaggtgagtatgctagg 
cttaattgttgcttatttgtcactttaacatttggtgtaatttctagtttaccttggttc 

25 tctctatgcatttctaaagcgtcatctcttaaagacatttaa 

Sequence 4 52 

VSVVSRAIGTPLIPAKLLNSAAFPSITGILASAPIFPKPRTAVPSVTTATVFPLIVYSYT 
FLESSWISLQGSATPGEYARLNCCLFVTLTFGVISSLPWFSLCISKASSLKDI * 

30 

Sequence 4 53 

Contig_04 78_pos_9610_9278, 

putative peptide of unknown function 

gtgtcagtttcttataaaattgctaaaaatctattggatcacatgtacaaaaatgaggat 
35 agatttctagcattacatagaaactacgaaaaggaaaaactattatttcttactttacct 
attattggactcataactataataggaagttcatttctcttcgattatttaatatttaaa 
ctgaataatacgtctgtagaaatattagggtccattcctactgttatatatcaaattatt 
atttgttttattcagtttatgttcacggctatgtttttaataatatttatttataccatt 
tggttttttatatatggaaagtttacaaaataa 

40 

Sequence 454 

VSVSYKIAKNLLDHMYKNEDRFLALHRNYEKEKLLFLTLPI IGLITIIGSSFLFDYL1 FK 
LNNTSVEILGSIPTVI YQI IICFIQFMFTAMFLIIFIYTIWFFI YGKFTK* 

45 Sequence 4 55 

Contig_0478_pos_4 758_3190, 

is similar to (with p-value 0.0e+00) 

>gp:gp|Z99111|BSUB0008_149 Bacillus subtilis complete genome 
(section 8 of 21): from 1394791 to 1603020. NID: g2633699. 

50 >gp:gp| Z97025 | BSZ97025_8 Bacillus subtilis nprE, yla [A, B, C, D 
,E,F,G, H,I, J, K,L,M,N,0] and pycA genes. NID: g2224758. 
atggttgacggtgtcgtactagtggttgacgcatatgaaggtacaatgcctcaaactcgt 
tttgttcttaaaaaagctttagaacaaaacttaaaaccggttgtagttgtgaataaaatt 
gat aaaccagctgctagacctgagggagttgtagatgaagtattagact tat teat tgaa 

55 ttggaagcgaatgatgagcaattagacttcccagttgtttatgcttcagctgtgaatgga 
acagcaagtttagactctgaaaagcaagacgaaaatatgcaatccctatacgagacgatt 
attgactatgttccggcaccagtagataattcagatgaaccattacaatt'ccaaattgct 
ttactagattataatgattatgtaggtcgtataggcgttggacgtgtgttcagaggtaaa 
atgcgtgtaggtgataatgtatcactaattaaattagatggtacagttaagaactttcgt 
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gtgacgaaaatatttggttactttggtcttaaacgtgaagaaattgaagaagcacaagca 
ggagacttaatagctgtttcaggtatggaagatattaacgttggtgaaacagttacacca 
catgatcatcgtgacccattaccggtgttacgtattgatgaaccaaccctagaaatgact 
tttaaagtaaataactctccgtttgctggacgtgaaggtgattatgtaacagctcgacaa 
5 attcaagaaagattagatcaacaacttgaaacagatgtttctttaaaagttacacctact 
gatcaaccagattcatgggttgttgctggtcgtggtgaactacacttgtctattcttatt 
gaaaacatgagacgtgaaggctttgaattacaggtttctaaacctcaagttattttaaga 
gaaatcgatggtgtgttaagtgaaccatttgagcgtgtacaatgtgaagtgccttctgaa 
aatgccggggcagtgattgagtcattaggtgcacgaaaaggtgaaatgttagatatgatg 

10 acgaccgacaatggtttgacgcgtttaatctttatggtacctgcacgcggtatgattggt 
tatactactgaatttatgtctatgacacgaggttatggaattattaaccatacatttgaa 
gaatttagacctcgcgttaaagctcaaatcggtggtagacgtaacggtgcattgatttct 
atggaccaaggtcaagcaacatcttatgcgattattaacttagaagatcgtggtgttaac 
tttatggaaccaggtactgaagtatatgaaggtatgattgttggtgaacataaccgtgag 

15 aacgatttaacagtaaatattactaaagcaaagcatcaaacaaacgtacgttcagctact 
aaagatcaaacacaaacgatgaatcgtcctagaattttaacattagaagaagcgttacaa 
tttatcaatgatgatgaattggtggaagtaactcctgaaagtattcgcttaagaaagaaa 
atacttaataaatctgcccgtgaaaaagaagcaaaaagagttaaacaattaatgcaagac 
gaacaataa 

20 

Sequence 4 56 

MVDGVVLVVDAYEGTMPQTRFVLKKALEQNLKPVVVVNKIDKPAARPEGWDEVLDLFIE 
LEANDEQLDFPVVYASAVNGTASLDSEKQDENMQSLYETI I DYVPAPVDNSDEPLQ FQI A 
LLDYNDYVGRIGVGRVFRGKMRVGDNVSLIKLDGTVKNFRVTKTFGYFGLKREEIEEAQA 

25 GDLIAVSGMEDINVGETVTPHDHRDPLPVLRIDEPTLEMTFKVNNSPFAGREGDYVTARQ 
IQERLDQQLETDVSLKVTPTDQPDSWVVAGRGELHLSILIENMRREGFELQVSKPQVILR 
EIDGVLSEPFERVQCEVPSENAGAVIESLGARKGEMLDMMTTDNGLTRLIFMVPARGMIG 
YTTEFMSMTRGYGIINHTFEEFRPRVKAQIGGRRNGALISMDQGQATSYAIINLEDRGVN 
FMEPGTEVYEGMIVGEHNRENDLTVNITKAKHQTNVRSATKDQTQTMNRPRILTLEEALQ 

30 FINDDELVEVTPESIRLRKKILNKSAREKEAKRVKQLMQDEQ* 

Sequence 457 

Contig_0478_pos_2736_1723, 

putative peptide of unknown function 

35 atggaacgattttgttgtgtaaatcaaattaactatattcaaatgaatccgttagaagcc 
aaatttaaaacgagcgctctaagatcatggaaaactgatcaggcagatgctcataagctt 
gcttgtttaggaccgacgcttaaacaaacagacagcttacctatacatgagttaatattc 
tttgaattaagagaacgcgtccgttttcatctagaaatcgagaatgaacaaaatcgactt 
aaatttcagatccttgaattactccatcaaacattccctggtttagaaagattgtttagt 

40 agtcgatattcaatcattgcactcaacatcgcagaaatctttactcattcagacatggtt 
cttgatatcgacaaggaggtactgattacacatatattcaattctacagataagggaatg 
tcaatggataaagctacaaaatatgcacttcaattaagggtgattgctcaagaaagctat 
cctaatgtcgatagacattcctttctagtcgaaaaattacgcttacttattcaacaatta 
aaacaatctattcatcatctcaaacaattagatgatgccatgattcaattagcacaacaa 

45 ctcgattattttgaaaatattcattcgatacctggtattggtaagctaagcacagctatg 
attattggggagattggtgatattaagcgatttaaatcaaataaacaactcaatgctttt 
gttggcattgatatcaaacgatatcaatcaggtcatacacactgtagagataccatcaac 
aagcgtggtaataaaaaagcgagaaaacttttattttgggtgattatgaatataataaga 
gggcagcatcattatgacaatcatgtcgtcgat tat tact acaaactaagaaagcagcct 

50 aatgagaaacctcataagactgccatcattgcttgtataaatcgattattaaaaacaatt 
cattatcttgtaatgaatcataaattgtacgattatcaaatgtcaccacattag 

Sequence 4 58 

MERFCCVNQINYIQMNPLEAKFKTSALRSWKTDQADAHKLACLGPTLKQTDSLPIHELIF 
55 FELRERVRFHLEIENEQNRLKFQILELLHQTFPGLERLFSSRYSIIALNIAEIFTHSDMV 
LDIDKEVLITHI FNSTDKGMSMDKATKYALQLRVIAQESYPNVDRHSFLVEKLRLLIQQL 
KQSIHHLKQLDDAHIQLAQQLDYFENIHS1PGIGKLSTAMIIGEIGDIKRFKSNKQLNAF 
VGIDIKRYQSGHTHCRDTINKRGNKKARKLLFWVIMNIIRGQHHYDNHVVDYYYKLRKQP 
NEKPHKTAI IACINRLLKTIHYLVMNHKLYDYQMSPH* 
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Sequence 459 

Contig_04 7 9_pos_2395_98 9, 
is similar to (with p-value 0.0e+00) 
5 >pir:pir j S19723 | S19723 dihydrolipoamide dehydrogenase (EC 1. 
8.1.4) - Staphylococcus aureus >gp : gp | X584 34 | SAPDHDNA_3 S.au 
reus pdhB, pdhC and pdhD genes for pyruvate decarboxylase, d 
ihydrolipoamide acetyltransf erase and dihydrolipoamide dehyd 
rogenase. NID: g48871. 

10 atggtagttggagat ttcccaattgaaacagatactattgtaataggagcaggtccaggt 
ggatatgtcgcagccattcgcgcggctcaattaggacaaaaggtaacaatcgttgagaaa 
ggtaatttaggtggtgtatgcttaaacgttggttgtataccttcaaaagcattactacat 
gcttctcatcgctttgttgaagcgcaaaattcagaaaacttaggggtaattgctgaaagc 
gtttcgttaaactatcaaaaagttcaagaattcaagacttctgtagttaataaattaact 

15 ggcggtgttgaaggacttttaaaaggtaacaaagtagagattgttagaggtgaagcttat 
ttcgttgataacaatagtttacgtgtcatggacgaaaagagtgctcaaacttacaatttc 
aaacatgcgattatagctacaggttcaagaccaattgaaattccaaattttgaatttggt 
aaacgtgttatcgattcaacaggagctttaaatctacaagaagtacctaacaaactagtt 
gtagttggtggcggatatatcggttctgaattaggtactgcttttgcaaactttggctct 

20 gaagttactatccttgaaggtgcaaaagatattttaggcggatttgaaaagcaaatgaca 
caacctgttaaaaaaggtatgaaagaaaaaggtatcgaaatcgttactgaagcaatggca 
aaatctgcagaagaaactgaaaatggtgtcaaagtaacttatgaggcaaaaggtgaggaa 
caaactatcgaagctgattatgtattagttacagttggccgtcgccctaatactgatgaa 
ttaggattagaagaacttggtctgaaatttgctgatcgtggattactagaagtggacaaa 

25 caaagtcgtacttctattgaaaatatctttgcgattggagatattgtacctggattacca 
ttagctcacaaagctagttatgaaggtaaagttgctgctgaagcgatagatggtcaagcc 
gcagaggtagactatattggtatgccagcagtttgctttacagaaccagaattagcacaa 
gttggttatactgaagctcaagcaaaagaagaaggtttatcaattaaagcttctaaattc 
ccttatgcagctaatggacgagctttatcattagatgatacaaatggttttgttaagtta 

30 attacacttaaagaagatgatacgcttattggagcacaagttgtaggtactggcgcatct 
gatattatctctgaattaggtttagctattgagtcaggtatgaatgctgaagatatcgca 
ttaactgtacatgcacacccaactttaggtgaaatgacaatggaagctgctgaaaaagca 
attggttatccaattcatactatgtaa 

35 Sequence 4 60 

MVVGDFPI ETDTIVIGAG PGGYVAAIRAAQLGQKVT I VEKGNLGGVCLNVGCI PSKALLH 
ASHRFVEAQNSENLGVIAESVSLNYQKVQEFKTSVVNKLTGGVEGLLKGNKVEIVRGEAY 
FVDNNSLRVMDEKSAQTYNFKHAIIATGSRPIEIPNFEFGKRVIDSTGALNLQEVPNKLV 
VVGGGYIGSELGTAFANFGSEVTILEGAKDILGGFEKQMTQPVKKGMKEKGIEIVTEAMA 

40 KSAEETENGVKVTYEAKGEEQTIEADYVLVTVGRRPNTDELGLEELGLKFADRGLLEVDK 
QSRTSIENIFAIGDIVPGLPLAHKASYEGKVAAEAIDGQAAEVDYIGMPAVCFTEPELAQ 
VGYTEAQAKEEGLS I KASKFPYAANGRALSLDDTNG FVKLI TLKEDDTLIGAQVVGTGAS 
DIISELGLAIESGMNAEDIALTVHAHPTLGEMTMEAAEKAIGYPIHTM* 

45 Sequence 4 61 

Con t i g_0 4 8 0_pos_5 6 7_1 6 1 0 , 

is similar to (with p-value 0.0e+00) 

>gp:gp|AJ005352 |SAA005352_4 Staphylococcus aureus, Sst putat 
ive iron transport operon. NID: g3724154. 

50 atgaaaaaaacagtcttatttttattattgtctctagttttagttttaacggcttgtagt 
aatagttcgaataataattcaacttcgaaaaagaaaaatagtgattctaaagaaactgta 
accatcaaaaatagttttgaagcaagtggtaaagaaaataatggcagtgataagaaaaaa 
atctctaatactgtcgaagtaccaaagaatcctaaaaatgccgttgtattagattatgga 
gcgcttgatgtgttgaaagaattaggtgtggctgataaagtaaaaggtttacctaaaggt 

55 gaaaataaccaatctttacctaaatttttagatgaatttaaagatgataagtatattaat 
actggaaatttaaaagaagtgaactttgataaagttgcatcagctaaaccagatgtgatt 
tttatttcaggaagaacagctaatcagaaaaatttagatgaatttaaaaaagctgcacca 
aaagctaaagttgtatatgtaggtacaagtgatgacaacttaattaaagatatgaaaaaa 
aatacagaaaatttagggaaaatctacgataaagaagataaagctaaaaaaattaataaa 
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gatttagatagaaaaatatctgatatgaaagataaaactaaagactttaataagaaagta 
atgtatttattggttaacgaaggtgaactatcaacgtttggaccaggaggaagatttggt 
ggtttagtgtttgatacattaggatttaaacctgcagacaaaaaggttagcaaaagcccg 
catggtcaaaatataaataatgaatatattaacaagcagaatccagatgttattttagct 
5 atggatcgtggttcagttgtaggtggtaaagcaacaacaaatcaagttttaaaaaacaaa 
gttataaaaaatgtaaaagcagtaaaaagtaatcatatttacgaattagatccaaaacta 
tggtatttctcttcaggatcttcaacgacaactatcaaacaaattgatgaattaaatgaa 
gtagtagagaaagttgaaaaataa 

10 Sequence 4 62 

MKKTVLFLLLSLVLVLTACSNSSNNNSTSKKKNSDSKETVTIKNSFEASGKENNGSDKKK 
ISNTVEVPKNPKNAVVLDYGALDVLKELGVADKVKGLPKGENNQSLPKFLDEFKDDKYIN 
TGNLKEVNFDKVASAKPDVIFISGRTANQKNLDEFKKAAPKAKVVYVGTSDDNLIKDMKK 
NTENLGKIYDKEDKAKKINKDLDRKISDMKDKTKDFNKKVMYLLVNEGELSTFGPGGRFG 

15 GLVFDTLGFKPADKKVSKSPHGQNINNEYINKQNPDVILAMDRGSVVGGKATTNQVLKNK 
VIKNVKAVKSNHIYELDPKLWYFSSGSSTTTIKQIDELNEWEKVEK* 

Sequence 4 63 

Contig_0480_pos_3934_4 680, 

20 putative peptide of unknown function 

atggataaaataccattaaaagagttagataatttaagtaaaactgataccactgataaa 
aataaaaaagaatttcgtgcact tcagcaggatattaataactatttgatacctgagttt 
aaaaaatataaaaattccactcaacatttaacagcagatactaatgaggttaagcacttg 
aaagaggattatctaaaaactgttgaaaataaagagaaatctatatatgatttaaaagaa 

25 tttgtagatttatgtaatcgctcaattaaagataatgaagatattttggattatactaaa 
ttattcgagaaaaatagaactgaagtggagtctgacattaataaagcacaaaataaagaa 
gat gcaagtcaacttaaatcaaaattagaagaaaataatcaacaattaaaaga tact get 
aaaaaatatttaaattcttcaaataatgattctgattcagcgaaagaagcaatcaaaaat 
catatttcaccacttattgacaaacaaattacggatattaataaaacaaatatttctgat 

30 aatcatgttgataatgctagaaaaaatgcaattgagatgtattatagtttgcaaaattat 
tatgatacgagagtagatacgattaaaactagcgaaaaattagctcaaattgatgttgaa 
cgattgccaaaagagggaaaagatatatcagaaatggataaatcgttcaaaagagaattt 
aaaaaaataaaagaaagtgtaaattaa 

35 Sequence 464 

MDKIPLKELDNLSKTDTTDKNKKE FRALQQDINNYLIPEFKKYKNSTQHLTADTNEVKHL 
KEDYLKTVENKEKS I YDLKEFVDLCNRSI KDNEDI LDYTKLFEKNRTEVESDI NKAQNKE 
DASQLKSKLEENNQQLKDTAKKYLNSSNNDSDSAKEAIKNHISPLIDKQITDINKTNISD 
NHVDNARECNAIEMYYSLQNYYDTRVDTIKTSEKLAQIDVERLPKEGKDISEMDKSFKREF 

40 KKIKESVN* 

Sequence 4 65 

Contig_04 80_pos_4 829_514 9, 

is similar to (with p-value 1.0e-17) 

45 >sp: sp | P39914 | YTXJ_BACSU HYPOTHETICAL 12.4 KD PROTEIN IN MUR 
C-AROA INTERGENIC REGION (ORF2) (ORF3) . >pir : pir | S2 1420 | S214 
20 hypothetical protein 2 - Bacillus subtilis >gp : gp | AF00822 
0| AF008220_116 Bacillus subtilis rrnB-dnaB genomic region. N 
ID: g2293135. >gp: gp I X65945 I BSAROAG_2 B. subtilis aroA-aroG g 

50 ene. NID: g39812. >gp: gp| Z99119 I BSUB0016_4 9 Bacillus subtili 
s complete genome (section 16 of 21) : from 2997771 to 321341 
0. NID: g2635411. 

atggctattaagctgagttcaat tgaccagtttgaacaagtattagaagaaaataaatat 
gtttttgtattaaaacacagtgaaacttgtccaatttctgcaaatgcgtatgatcaattt 
55 aataagtttttatatgaaagagacatagatggttattatctaatcgttcaacaagagcgt 
aaactatctgattatatcgcagagaaaacaaacgtaaaacacgaatcaccacaagct ttt 
tattttgtagatggtgaaatgaagtggaatgcagaccacgatgatattaacgtttctcaa 
cttgctcaagctgaggaataa 
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Sequence 4 66 

MAIKLSSIDQFEQVLEENKYVFVLKHSETCPISANAYDQFNKFLYERDIDGYYLIVQQER 
KLSDY I AEKTNVKHES PQAFY FVDGEMKWNADH DDI NVSQLAQAEE* 

5 

Sequence 4 67. 

Contig_0480_pos_11347_10484, 
is similar to (with p-value 2.0e-54) 
10 >gp:gpl Y17116I SEY17116_1 Staphylococcus epidermidis gene enc 
oding f ibrinogen-binding protein, complete CDS. NID: g320154 
9. 

atgacgattgatagtggattttatcaaacacctaaatacagcttagggaactatgtatgg 
tatgacactaataaagatggtattcaaggtgatgatgaaaaaggaatctctggagttaaa 

15 gtgacgttaaaagatgaaaacggaaatatcattagtacaactacaaccgatgaaaatgga 
aagtatcaatttgataatttaaatagtggtaattatattgttcattttgataaaccttca 
ggtatgactcaaacaacaacagattctggtgatgatgacgaacaggatgctgatggggaa 
gaagttcatgtaacaattactgatcatgatgactttagtatagataacggatactatgat 
gacgaatcggattccgatagtgactcagacagcgactcagattccgatagtgattcagac 

20 tccgatagcgactcggattcagacagcgactcagattcagacagcgactcggattctgat 
agcgactcggattcagacagcgactcagactcagacagtgattcagattcagacagtgat 
tcagattcagacagcgactcagattccgatagtgattcagactcagacagcgactcagat 
tccgatagtgattcagactcagacagtgactcagattctgatagtgattcagactcagac 
agtgattcagattccgatagtgattcagactccgatagcgactcagactcggatagtgac 

25 tcagactcggatagtgactcagattctgatagtgattcagactcaggcagtgattcggat 
tccgatagtgattcagactcagacaacgactcagatttaggcaatagctcagataagagt 
acaaaagataaattacctgtgtaa 

Sequence 4 68 

30 MTIDSGFYQTPKYSLGNYVWYDTNKDGIQGDDEKGISGVKVTLKDENGNIISTTTTDENG 
KYQFDNLNSGNYIVHFDKPSGMTQTTTDSGDDDEQDADGEEVHVTITDHDDFSIDNGYYD 
DESDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSD 
SDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSD 
SDSDSDSDSDSDSDSGSDSDSDSDSDSDNDSDLGNSSDKSTKDKLPV* 

35 

Sequence 469 

Contig_04 80_pos_104 63_9366, 

is similar to (with p-value 2.0e-25) 

>sp:sp|P54 595| YHCK_BACSU HYPOTHETICAL 40.7 KD PROTEIN IN CSP 

40 B-GLPP INTERGENIC REGION . >gp: gp | X96983 | BS7 5DGREG_12 B.subti 
lis chromosomal DNA (region 75 degrees: cspB upstream of glp 
PFKDoperon). NID: gl239975. >gp: gp | Z99108 | BSUB0005_180 Baci 
llus subtilis complete genome (section 5 of 21) : from 802821 
to 1011250. NID: g2633055. 

45 atggttaataaaggagtcggtatggaaatgtttgaagctatcatatataacatatctgtc 
atggtggcaggtatatatttatttcataggttacaatattctgaaaataaaagaatgatt 
ttttctaaagaatatgtaacagtactaatgacattcgtttctttacttttagcggcatac 
cctatcccatttcaaaacgaatacctcgtccatttaacatttgtacctcttttgttttta 
ggacgttataccaacatgatatatacactcacggctgcttttatcgtatctttagtcgat 

50 gtatttatctt tggaaactcaattatttatggtattacattaatcgt tattgcaggtatt 
gtcagtgcagtgggaccattcttaaagcaaaacgatatcatttctttacttattttaaat 
ttgattagcattatcattttgttatttttagcattattaagccctatttatgaactcgta 
gagattttagtgcttatccctatttcatttattattacaattgcttcagcaataacattc 
gttgatatatggcactttttctctttagtcaatcgttatgaaaatgaagataaatacgat 

55 tatcttacaggtctaggtaatgtgaaagaatttgatagacacttaaatgaggtctcaagt 
aaagctgaagaaaagaaacaaagtttagccttacttctcattgatattgatggctttaaa 
gatgtaaacgatcattattcacaccaatcaggagatgctgttctcaaacaaatgtctcaa 
ctattaaaaaactatgtcccaaaccagttcaaaatatttagaaacggtggcgaagaattt 
tctgttgtaataagagat tacacactagatcaaagcgtgaaattagcagaaaatat tcga 
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agtggtgttgaaaaatcttctttccacctaccaaacaaagaagtaatcaagctatcagtt 
tcaattggtgtaggatacttaactcaagaagatcgtaaatctcaacgtaaagtatttaaa 
gatgctgatgacatggtacatgtggctaaaagtgaaggaagaaataaagtcatgtttaat 
cctattgtcaaattataa 

5 

Sequence 470 

MVNKGVGMEMFEAI IYNISVMVAGIYLFHRLQYSENKRMIFSKEYVTVLMTFVSLLLAAY 
PI PFQNEYLVHLTFVPLLFLGRYTNMIYTLTAAFI VSLVDVFI FGNSI I YGITLIVI AGI 
VSAVGPFLKQNDIISLLILNLISI IILLFLALLSPIYELVEILVLIPISFIITIASAITF 
10 VDIWHFFSLVNRYENEDKYDYLTGLGNVKEFDRHLNEVSSKAEEKKQSLALLLIDIDGFK 
DVNDHYSHQSGDAVLKQMSQLLKNYVPNQFKIFRNGGEEFSVVIRDYTLDQSVKLAENIR 
SGVEKSSFHLPNKEVIKLSVSIGVGYLTQEDRKSQRKVFKDADDMVHVAKSEGRNKVMFN 
PIVKL* 

15 Sequence 471 

Contig_04 80_pos__9060_8410, 

putative peptide of unknown function 

atgaatcgtattgcccatagttatggtttacatgatacatacagttttgtgacatcaact 
gcaattattttctcattaaatgatcgtactagtacgaggttgattcgtattcgcgaacgt 

20 acaaccgatcttgagaaaattgct ttaaccaatagcctatctcgtaaaatttcgagtaag 
caacttacaattgacgaagcaaaaagtgagttactgcaacttaaacgtgcgtctcttcag 
tattctttcttaacaaatctcattgctgcctttgtagcttgtggttttttcttattcatg 
tttggtggcgtagcttccgacgcttggattgcatgcctagcgggtggcatagctttttta 
acgtttagtttcgtgcaaaaatatatacaaattaaattcttttcagagtttgtagcatct 

25 gctgttgttattagtattgcagcaatattcactaaactaggtatagctaaaaatcaagac 
attattactattgcaagtgtcatgcctctcgttcccggtattttgattactaacgctatt 
cgtgacttacttgccggagagttacttgctggtatgtcacgtggtgttgaagctgcttta 
actgcatttgctattggtgcaggagtagctattgtattactattattataa 

30 Sequence 472 

MNRIAHSYGLHDTYSFVTSTAIIFSLNDRTSTRLIRIRERTTDLEKIALTNSLSRKISSK 
QLTIDEAKSELLQLKRASLQYSFLTNLIAAFVACGFFLFMFGGVASDAWIACLAGGIAFL 
TFSFVQKYIQIKFFSEFVASAVVISIAAIFTKLGIAKNQDIITIASVMPLVPGILITNAI 
RDLLAGELLAGMSRGVEAALTAFAIGAGVAIVLLLL* 

35 

Sequence 473 

Cont i g_0 4 8 0_pos_8 3 9 4_7 90 0 , 

putative peptide of unknown function 

atgtttatttatctgtttcactttatcattagtttcattgccacagtccttttttcaatt 
40 atatttaatgcacctaaaaaattgctattagcttgtggatttgttggagctgttgcttgg 
acaatatatcagatgacagtaggtatggatttaggtaaagttggcgcttcatttttagga 
agtctaatattaggattaatgagtcatacaatgagtagacggtacaagcaacctgttatt 
atatttatcgtccccggcattatacctctcgttccaggtggcgcagcatatgaagctaca 
agatttttagtatcaaataattatacgaatgcagttaatacttttttagaggtaacatta 
45 atttctggtgcaattgcattcggtatacttgtatctgaaatagtctattacatttattca 
cgcatcaagcaatcttatggtaaaatcaagggtaaaacttataaaaaatcctataatatg 
aataatagagtataa 

Sequence 474 

50 MFIYLFHFIISFIATVLFSIIFNAPKKLLLACGFVGAVAWTIYQMTVGMDLGKVGASFLG 
SLILGLMSHTMSRRYKQPVI IFIVPGII PLVPGGAAYEATRFLVSNNYTNAVNTFLEVTL 
ISGAIAFGILVSEIVYYIYSRIKQSYGKIKGKTYKKSYNMNNRV* 

Sequence 475 
55 Contig_04 80_pos_7726_6641, 

is similar to (with p-value O.Oe+00) 

>sp:sp|P55179| PEPT_BACSU PEPTIDASE T (EC 3.4.11.-) (AMINOTRI 
PEPTIDASE) (TRI PEPTIDASE) . >gp : gp | X99339 I BSGALE_6 B.subtilis 
orfs 1,2,3,4, pepT and galE genes. NID: gl429253. >gp:gp|Z9 
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9123 1 BSUB0020_187 Bacillus subtilis complete genome (section 
20 of 21): from 3798401 to 4010550. NID: g2636240. >gp:gp|D 
83026 | D83026_30 Bacillus subtilis genome sequence covering 1 
ic-cel region. NID: gl783231. 
5 atggatgaacatggttacttatttgctacactcgaaagcaatattaattataatgtacct 
actgtcggttttttagcacatgtagacacttcaccagatttcaatgcttctcatgtaaat 
ccgcaaatcattgaagcctataatgggcaacctatcaaacttggtgaatctcagcgtatc 
ttagatcctgatgtttttcctgaattaaataaagttgtgggtcatacactaatggtgaca 
gatggtacatctctactaggcgccgatgataaagcaggtgttgtagaaataatggaaggg 

10 ataaagtatttaattgatcatcctgacattaaacacggtacaattcgagttggctttaca 
cccgatgaagaaattggacgaggcccgcatcaatttgatgttagtcgatttaatgcagat 
tttgcatatacaatggatggcagtcaattaggagaactacaattcgaaagtttcaatgcg 
gcagaggtaactgtcacttgccatggtgttaacgttcatccaggttcagctaaaaatgcc 
atggttaatgcaattagtttaggtcaacagtttaatagtttacttccctcacatgaagtg 

15 cctgaaagaactgaaggatacgaagggttctatcatttaatgaattttacaggtaatgtt 
gaaaaagcaactctacaatatattattcgcgaccatgacaaagaacagtttgagctacgt 
aaaaaacgcatgatggaaattcgtgatgatattaatgttcattataatcattttccaatt 
aaagtagatgtgcatgaccaatattttaacatggcagaaaaaattgaacctttgaaacac 
atcattgatatacctaaacgtgtctttgaggctttagacatcgtacctaacactgaacct 

20 attcgaggtggtacagatggatcacaattatcttttatggggttacctacacctaatatt 
tttactggttgtggcaatttccacggtccttttgaatacgcttctatcgatgtaatggaa 
aaggctgttcatgttgtcgttggtattgctcaagaagtagcaaacagccatcaatcttat 
aaataa 

25 Sequence 4 76 

MDEHGYLFATLESNINYNVPTVGFLAHVDTSPDFNASHVNPQI IEAYNGQPIKLGESQRI 
LDPDVFPELNKWGHTLMVTDGTSLLGADDKAGVVEIMEGIKYLIDHPDIKHGTIRVG FT 
PDEEIGRGPHQFDVSRFNADFAYTMDGSQLGELQFESFNAAEVTVTCHGVNVHPGSAKNA 
MVNAISLGQQFNSLLPSHEVPERTEGYEGFYHLMNFTGNVEKATLQYIIRDHDKEQFEbR 

30 KKRMMEIRDDINVHYNHFPIKVDVHDQYFNMAEKIEPLKHIIDIPKRVFEALDIVPNTEP 
IRGGTDGSQLSFMGLPTPNIFTGCGNFHGPFEYASIDVMEKAVHWVGIAQEVANSHQSY 
K* 

Sequence 477 
35 Contig_0480j?os_J3029_2097, 

is similar to (with p-value 7.0e-41) 

>sp:sp| P18579|MURB_BACSU UDP-N-ACETYLENOLPYRUVOYLGLUCOSAMINE 
REDUCTASE (EC 1.1.1.158) (UDP-N- ACETYLMURAMATE DEHYDROGENA 
SE) . >pir:pir I S26500 I A4 3727 probable division initiation reg 

40 ulatory protein 1 - Bacillus subtilis >gp : gp I M31827 | BACDDSA_ 
2 Bacillus subtilis (clone lambda-BSl) cell division and spo 
rulation protein (dds) gene, complete cds . NID: gl42831. >gp 
: gp | Z99111 I BSUB0008_195 Bacillus subtilis complete genome (s 
ection 8 of 21): from 1394791 to 1603020. NID: g2633699. 

45 atgttcaaaacattgaataaaaatgacatcttacgcggattagagtcaattcttcctaaa 
gatattattaaagtggatgaacctctcaagcgttatacatatacagaaacaggaggagag 
gcagatttttatttatcccctaccaaaaatgaagaagtccaagccatcgtaaagtttgcc 
catgagaacagtataccggtaacttatttaggaaatgggtctaacattatcattcgagaa 
ggtggaattcgaggaatcgtcctcagcttattatctctcaatcatattgaaacctctgat 

50 gatgcaattatagcaggtagtggtgcagcaattattgacgtttcaaatgttgcacgtgac 
catgtattaaccggtttagaatttgcatgcggtatccctgggtcaattggtggcgccgta 
ttcatgaatgctggtgcttatggcggagaagttaaagactgtattgactatgcattatgt 
gtcaatgaaaaaggtgatttattaaagctcactacagctgaactggaattagactataga 
aatagtgtggtacaacaaaaacatt tagttgtattagaggctgctttcaccttagaacca 

55 ggtaaattagatgaaattcaggccaaaatggatgatcttactgaaagacgtgaatctaaa 
caaccgcttgaattcccttcttgcggaagtgttttccaaagaccaccgggtcattttgca 
ggtaaactcattcaagattctaatttacagggctatcgaatcggtggcgttgaagtttca 
actaagcatgcgggattcatggttaatgtagacaacggtacagcaactgattatgaagca 
cttatacatcacgtacaaaaaatagttaaagaaaaattcgatgttgaattgaatactgag 
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gtacgtattataggtgatcatcccacagattaa 
Sequence 478 

MFKTLNKNDILRGLESILPKDI IKVDEPLKRYTYTETGGEADFYLSPTKNEEVQAIVKFA 
5 HENSIPVTYLGNGSNIIIREGGIRGIVLSLLSLNHIETSDDAI IAGSGAAIIDVSNVARD 
HVLTGLEFACGIPGSIGGAVFMNAGAYGGEVKDCIDYALCVNEKGDLLKLTTAELELDYR 
NSVVQQKHLVVLEAAFTLEPGKLDEIQAKMDDLTERRESKQPLEFPSCGSVFQRPPGHFA 
GKLIQDSNLQGYRIGGVEVSTKHAGFMVNVDNGTATDYEALIHHVQKIVKEKFDVELNTE 
VRIIGDHPTD* 

10 

Sequence 479 

Contig_04 81_pos_1175_1876, 

is similar to (with p-value 0.0e+00} 

>sp:sp|P37478| YYCF_BACSU HYPOTHETICAL 27.2 KD SENSORY TRANSD 

15 UCTION PROTEIN IN ROCR-PURA INTERGENIC REGION . >gp: gp 1 D26185 
|BAC180K_1 B. subtilis DNA, 180 kilobase region of replicati 
on origin. NID: g467326. >gp : gp I D781 93 I BACGNTZAJ34 Bacillus 
subtilis 36kb sequence between gntZ and trnY genes encoding 
34 ORFs. NID: gl064780. >gp : gp | Z99124 | BSUB0021_1 4 6 Bacillus 

20 subtilis complete genome (section 21 of 21): from 3999281 to 
4214814. NID: g2636442. 
atggctagaaaagttgttgtagttgacgatgaaaaaccaattgctgatattttagaattt 
aatttaaaaaaagaaggttacgacgtatattgcgcttatgacggtaatgacgcagtagat 
ttaatctatgaagaagaaccagatatcgtcttacttgatatcatgttacctggtagagat 

25 ggtatggaagtgtgtcgtgaagtgcgtaaaaagtatgaaatgccaattatcatgctgaca 
gcgaaagattctgaaattgataaagttttaggtcttgaattaggtgcagatgactacgta 
actaaaccatttagtactcgtgaactcatcgcacgtgtgaaagcgaacttacgccgtcat 
tattcacaaccagctcaagaagtaagtggtgcgacaaatgaaattacaattaaagatatt 
gtgatttatccagatgcatattcaattaaaaaacgtggagaagacattgaattaacgcat 

30 cgtgaattcgagctgttccattatctttctaaacatatgggtcaagtcatgacacgtgaa 
cacttactacaaacagtgtggggttacgattatttcggtgatgttcgtactgtggacgta 
acaattcgccgtttaagagaaaaaattgaagatgatccatctcatccagaatacattgtg 
acacgtagaggcgttggatacttcctccaacaacatgattag 

35 Sequence 480 

MARKVVVVDDEKPIADILEFNLKKEGYDVYCAYDGNDAVDLI YEEEPDIVLLDIMLPGRD 
GMEVCREVRKKYEMPIIMLTAKDSEIDKVLGLELGADDYVTKPFSTRELIARVE^ANLRRH 
YSQPAQEVSGATNEI TIKDI VI YPDAYS I KKRGEDIELTHREFELFH YLSKHMGQVMTRE 
HLLQTVWGYDYFGDVRTVDVTIRRLREKIEDDPSHPEYIVTRRGVGYFLQQHD* 

40 

Sequence 481 

Contig_04 81_pos_1889_3721, 

is similar to (with p-value 0.0e+00) 

>gp:gp|D78193|BACGNTZA_33 Bacillus subtilis 36Jcb sequence be 
45 tween gntZ and trnY genes encoding 34 ORFs. NID: gl064780. > 
gp:gp I Z99124 | BSUB002 114 5 Bacillus subtilis complete genome 
(section 21 of 21): from 3999281 to 4214814. NID: g2636442. 
atgaagtggcttaaacaactacaatcccttcacacgaaactcgttattgtttatgtacta 
ctcattattattggtatgcaaatcatcggtttgtattttacgaatagtttagaaaaggaa 
50 ttactcgataacttcaagaagaacataacacaatatgcgaagcaattagacgtcaatatt 
gaaaaggtttataaagataaagataaaggttcagtcaacgctcaaaaggatatccaagac 
cttttgaatgaatatgcgaatcgccaagaaataggagaaatacgctttattga taaagac 
caaattatcatggcaacaaccaagcagtctaaccgtggtcttatcaatcaaaaggttaac 
gacggttcagttcaaaaggcgctctccttagggcaaacgaatgatcatatggttcttaag 
55 gattacggaagtggtaaagagcgtgtttgggtatataatataccggttaaagttgataaa 
cagacaatcggtgatatatacatagaatcgaaaattaatgatgtatacaatcagctgaac 
aaca t t aa t cagata ttcatcgtagggacagcga tat cactatt cat tacagtaat acta 
ggattcttcattgcacgaacgattactaagccgataaccgatatgcgtaaccaaaccgtt 
gagatgtctaaaggtaactacacgcaacgagtgaagatatacggtaacgatgaaatcggt 
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gagctcgcacttgccttcaataacttatcgaaacgtgtccaagaagcacaagcgaataca 
gaaagtgagaaacgtcgcctagattctgttatcacacatatgagcgatggtattcttgcg 
acagatcgccgtggacgtgtacgtattgcaaacgacatggcgctgaaaatgctcggtctc 
gcgaaagaagatgtcatcggctactacatgcttggtgtccttaacttagaaaatgaattc 
5 tcattagaagaaattcaagaaaatagtgattccttct tgttagatattaacgaagaagaa 
ggcattattgcacgtgtaaactttagtacgattgtacaagaaacaggtttcgtgacaggt 
tacattgccgtactacatgatgtcacagaacaacaacaagtcgaacgtgaacgtcgcgaa 
ttcgttgcgaatgtttcacatgaattacgtacaccactgacatcgatgaatagctatatc 
gaagcacttgaagaaggtgcttggcaagataaagaactggcaccatcattcctatctgtc 

10 acacgcgaagagactgaacgtatgattcgtttagtgaatgatttacttcaattatctaaa 
atggataatgaatcagatcaaattacgaaagaaattatcgacttcaatatgtttatcaac 
aaaattattaaccgtcatgaaatgacagcgaaagatacgacattcgtacgcgaaattccg 
caacaaaccatctttgctgaaatcgatccagacaagatgacacaagtatttgataatgtc 
attaccaatgcaatgaaatattcacgtggcgagaaacgtgttgagtttcatgtgaaacaa 

15 aatgcactttacaatagaatgacgattcgtattaaagataatggtattggaataccgatt 
aacaaggtagataaaatatttgatagattctatcgtgtagataaagcacgtacacgtaag 
atgggtggtacaggactaggtctagctatttccaaagagattgtcgaagcacataacggt 
cgaatttgggctaacagtgtggaaggacaaggtacgtcaatctttattacacttccttgc 
gaaatcattgaagacggtgattgggatgaataa 

20 

Sequence 482 

MKWLKQLQSLHTKLVIVYVLLIIIGMQIIGLYFTNSLEKELLDNFKKNITQYAKQLDVNI 
EKVYKDKDKGSVNAQKDIQDLLNEYANRQEIGEIRFIDKDQIIMATTKQSNRGLINQCTN 
' DGSVQKALSLGQTNDHMVLKDYGSGKERVWVYNIPVKVDKQTIGDIYIESKINDVYNQLN 

25 NINQI FIVGTAISLFITVILGFFIARTITKPITDMRNQTVEMSKGNYTQRVKI YGNDEIG 
ELALAFNNLSKRVQEAQANTESEKRRLDSVITHMSDGILATDRRGRVRIANDMALKMLGL 
AKEDVIGYYMLGVLNLENEFSLEEIQENSDSFLLDINEEEGIIARVNFSTIVQETGFVTG 
YIAVLHDVTEQQQVERERREFVANVSHELRTPLTSMNSYIEALEEGAWQDKELAPSFLSV 
TREETERMIRLVNDLLQLSKMDNESDQITKEIIDFNMFINKIINRHEMTAKDTTFVREIP 

30 QQTIFAEIDPDKMTQVFDNVITNAMKYSRGEKRVEFHVKQNALYNRMTIRIKDNGIGIPI 
NKVDKIFDRFYRVDKARTRKMGGTGLGLAISKEIVEAHNGRIWANSVEGQGTSIFITLPC 
EIIEDGDWDE* 

Sequence 483 
35 Contig_04 81_pos_37 65_5051, 

putative peptide of unknown function 

atgagtatcgttttgacatacatggtctggaacttttctccagacctttcaaatattgat 
aacacggataa tag taaaagtgataagcctaaaccact tact aaaccaatgactgcagaa 
atggaaggaacgattacaccatttcaaatcgtgcattctagagatgaaaaatctcaagga 

40 acagtggcatcaggtgcagtcttagacaagatgattcaacctttaaaaaatcaagaagtt 
aaatctgtatcacatctgaaaagggaacataaccttgttatacctgaactaagcaacgac 
tttatcgtcctagatttcacttatgatttgccactttcaacatacttaagtcaagtactc 
gatatcgatgcgaaagtgccgaataactttaattttgatcgcctccttatcgatcaagat 
cataataaccacgtcgtactatatgcgattagcaaagaccgtcatgaagtagttaaactt 

45 aagacaacgatgaaagggaataacgttgacaaagcttttaaaagtatcgaacctgacatg 
caaccctatacggaaatcatcacgaataaagatacaatcgacaaagcaacacacgtgttt 
gcaccaagcaaaccgaaagacttaaagacgtatcgcatggtcttcaatacgatcagtgtt 
gaacgcatgaactcaatactatttgatgattcaacgattgttcgtagctctcaaagtggt 
acgacaacatacaacaacaatactggtgtcgccaactataacgataaagatgaaatgtat 

50 cattataagaatttatctgaagacgcgaaaagttcaagcaacatgcaagaaaccatccca 
ggcacatacgagtttataaatagtcatggtggcttcttaaatgaagattatcgcctattt 
aagacagataatagaacggggaaactcacatatcaaagattcctcaacggtcacccaacg 
tttaataaacataacttcaatgaaatccaagtcacatggggggataaaggcgtttacgat 
tatcaacgttcgctacttaagacggacgtcacactgaacagtgaagaatctaaatccgtc 

55 cctaccgttgagtccgtgcgttctgcattagccaaccatcctgatattgattttgaaaag' 
gtaacgaacattgcgattggttatgatatggacgacaaggcaaataacgaagatattgaa 
gttcaacgtaactgtgaattaataccacgttggtttgtagaatacgatggcaattggtat 
gcctataaagatgggaggcttgaataa 
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Sequence 4 84 

MSIVLTYMVWNFSPDLSNIDNTDNSKSDKPKPLTKPMTAEMEGTITPFQIVHSRDEKSQG 
TVASGAVLDKMIQPLKNQEVKSVSHLKREHNLVIPELSNDFIVLDFTYDLPLSTYLSQVL 
DIDAKVPNNFNFDRLLIDQDHNNHVVLYAISKDRHEVVKLKTTMKGNNVDKAFKSIEPDM 
5 QPYTEIITNKDTIDKATHVFAPSKPKDLKTYRMVFNTISVERMNSILFDDSTIVRSSQSG 
TTTYNNNTGVANYNDKDEMYHYKNLSEDAKSSSNMQETI PGTYEFINSHGGFLNEDYRLF 
KTDNRTGKLTYQRFLNGHPTFNKHNFNEIQVTWGDKGVYDYQRSLLKTDVTLNSEESKSV 
PTVESVRSALANHPDIDFEKVTNIAIGYDMDDKANNEDIEVQRNCELIPRWFVEYDGNWY 
AYKDGRLE* 

10 

Sequence 485 

Con t ig_0 4 8 l_pos_5 05 2 5 8 4 3 , 

putative peptide of unknown function 

atgaactggaaactcacgaaaacacttttcattttcgtttttattcttgtgaacatcttt 
15 ttagtcatcgtttatattgataaagtgaataaatcacaagttaatgactcggaaaaggta 
aacgaggtcaattttcaacaagaagaaattgacgtgcccaaggatgtcttgaatcaaaat 
gttaaagatactgaacttgaacaaattactgcccgttcaaagaatttctcaagttatgcg 
aaagatcattcaagcatgcaaacgtctgattccgacaaaacacttgaaggagatattgat 
aaaggcgttcaagtgagtgataagaacttacaagatatcaaagagtacattgcaaagaaa 
20 atctttaacggtaaagagtatcagttaagtgatttaactaaagataaagtcacttacgaa 
caaacgtataaagattatccgattatgaataatagtaaagcgcgcctaacgtttaatt tg 
agcgatggcaaggcgacaagctataaacagacagcgatggatgatatacaagtagctaaa 
ggttcaaatagcacgaagaaacaagtcatcacgccgcgtaaagctattgaagccctttat 
tacaatagatatttaaaacaaaatgatcaagttcttgatgcacgcctaggctattattca 
25 gttgtaaaggaaacaaacgttcaattactccaacctaactgggaaattaaagtaaaacat 
aaaggcaaggatgaagttcaaacctattatgtagaagctacaaatcataatccgaaagtg 
attgattattag 

Sequence 4 86 

30 MNWKLTKTLFIFVFILVNIFLVIVYIDKVNKSQVNDSEKVNEVNFQQEEIDVPKDVLNQN 
VKDTELEQITARSKNFSSYAKDHSSMQTSDSDKTLEGDIDKGVQVSDKNLQDIKEYIAKK 
IFNGKEYQLSDLTKDKVTYEQTYKDYPIMNNSKARLTFNLSDGKATSYKQTAMDDIQVAK 
GSNSTKKQVITPRKAIEALYYNRYLKQNDQVLDARLGYYSVVKETNVQLLQPNWEIKVKH 
KGKDEVQTYYVEATNHNPKVIDY* 

35 

Sequence 4 87 

Contig_0481_pos_657 9_7265, 

is similar to (with p-value 8.0e-79) 

>gp:gp|D78193|BACGNTZA_30 Bacillus subtilis 36kb sequence be 
40 tween gntZ and trnY genes encoding 34 ORFs . NID: gl064780. > 
gp:gpl Z99124 IBSUB0021142 Bacillus subtilis complete genome 
(section 21 of 21): from 3999281 to 4214814. NID: g2636442. 
atggaagaacttttcagccaaatcgacagaaacattaaggatttaaacggaattttagtg 
acacatgaacacatcgaccatattaaaggtcttggtgttttagcacgtaaatataaactt 
45 ccgatttacgcgaatgagaatacatggaaagcgatagagaagaaagatagccgcattcca 
atggatcagaaatttatctttaatccatatgaaacgaaatctcttgcaggatttgatata 
gaatcatttaacgtgtcacatgacgcgattgatccacaattctacatcttccacaataac 
tataagaaatttacgatgataactgacactggttacgtttcagatcgtatgaaaggtatg 
attcaaggtagtgatgtctttatgtttgaaagtaatcacgatgtcgatatgttacgcatg 
50 tgtcgctatccatggaagacgaaacaacgtattttaagtgatatgggtcacgtatccaat 
gaagacgcgggtcttgcgatgagtgatgtcattacaggtaatacgaaacgtatatacctc 
tctcatttgtcacaagacaataatatgaaagacctcgcacgcatgagtgttggacaagtg 
ctcaacgaacacgatatcgatacagagaaagaagtattgctttgcgataccgataaagca 
caagccacaccgatttatacactataa 

55 

Sequence 488 

MEELFSQIDRNIKDLNGILVTHEHIDHIKGLGVLARKYKLPIYANENTWKAIEKKDSRIP 
MDQKFIFNPYETKSLAGFDIESFNVSHDAIDPQFYIFHNNYKKFTMITDTGYVSDRMKGM 
IQGSDV FMFESNHDVDMLRMCRYPWKTKQRILSDMGHVSNEDAGLAMSDVITGNTKRI YL 
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SHLSQDNNMKDLARMSVGQVLNEHDIDTEKEVLLCDTDKAQATPIYTL* 
Sequence 489 

Contig_0481_pos_8820_7690, 
5 putative peptide of unknown function 

atgaattcgttgcacatagcaggacgtattttcaaacagacgattcgagatgtaagaaca 
ttggcactgttacttattgcacctatattactattgtcgctactatattacatttttaca 
gttgccgataatacgaatggcgtaacagttggggttcacgatgtaccagattcattaatg 
actgaattacatgataaagatattcacgttaaacattataaaaatgacaatgatataagt 

10 gataaaattaaagacgacaaattaacaggatttttgcacagtgatggtcaaaaagtatca 
gtgacttatgctaacgataatcctacacaagcaggagaactaacaggtgcaaatcaaaaa 
tggttaatgagtcataacatgaatgccatgaaagataatactaataaattgcatcaagcg 
ttaactaaaatacaacaaaaaatgcccggggatgggggagacacgcctcatcaagatatg 
gctaaaccatataaactaacaacgcactatttatatggttcatcagattctacgtatttt 

15 gatatgataaatcctattttaattggattttttgtctttttctttacgtttttaatttct 
ggcattggcttattaaaagagcgtacttctggcacattagaacgtttacttgcctctcca 
ataaaaagaagtgaaattatttttggttatgttttcggttatggtagttttagcgttatc 
caaacaatagttgtcgtattatatgcaatttatattctgcatatagacttagtaggttcg 
atatggttcgtactattaacggcaatattaacagcgcttgtcgctgtgacattcggtata 

20 ttattatctacctttgcttcctcagaattccaaatgattcaatttataccattagtcata 
gtgccacaagtactatttgcaggcattataccaattgaatcaatgaataaaggattacaa 
tacttttcacatatcatgccgttattctataccggccaaacgatgcaaaatattatgatc 
aagggttatggattcaacgatatttacatttatttaattgtgttattcgcatttttcatt 
ttcttattgattttaaatattataggcatgaaaagatatagaaaagtttag 

25 

Sequence 4 90 

MNSLHIAGRI FKQT IRDVRTLALLLIAPILLLSLLYYI FTVADNTNGVTVGVHDVPDSLM 
TELHDKDIHVKHYKNDNDISDKIKDDKLTGPLHSDGQKVSVTYANDNPTQAGELTGANQK 
WLMSHNMNAMKDNTNKLHQALTKIQQKNPGDGGDTPHQDMAKPYKLTTHYLYGSSDSTYF 
30 DMINPILIGFFVFFFTFLISGIGLLKERTSGTLERLIiASPIKRSEII FGYVFGYGSFSVI 
QT I VVVL YA I Y I LH I DLVG S I WFVLLTA I LTAL VAVT FG I LLS T FAS S EFQM I QFI P L V I 
VPQVLFAGIIPIESMNKGLQYFSHIMPLFYTGQTMQNIMIKGYGFNDI YI YLIVLFAFFI 
FLLILNI IGMKRYRKV* 

35 Sequence 4 91 

Con t i g_0 4 81 _po s _7 6 8 0_7 336, 

putative peptide of unknown function 

atgaaccaagatattaagtcattagttgaaaccattgtgcctcaacttgaatatttaagc 
gataaacaaagacgtgtcatagaaagtgctattgcattattcagtgaacaaggatttgat 
40 aaaacgagtactaaagaaattgcgcagcgtgcaaatgtcgcagaaggaacggtatttaag 
cagtttaaaagtaaaagaatgttattatatataaatcacaaagcgtgtaagacatcgttc 
tcccccttccatgatgcgctttcatttcaaaaaaattccttaatcaatcgttcatgcgca 
agcctgtttaaactaacacataatcattccctgcaattctcctga 

45 Sequence 4 92 

MNQDIKSLVETIVPQLEYLSDKQRRVIESAIALFSEQGFDKTSTKEIAQRANVAEGTVFK 
QFKSKRMLLYINHKACKTSFSPFHDALSFQKNSLINRSCASLFKLTHNHSLQFS* 



50 

Sequence 4 93 

Contig_04 82_pos_2955_2551, 

putative peptide of unknown function 

gtgtcagtaactgtaaaaggacaaactgaaacagaatggcttccagtattggattttaga 
55 aacaaatctttagcaaagggtagcgcgacaacatttgatattaataaagctcaaaaacgt 
tgtttcgttaaagctgcagcattacatggcctaggtctttatatatacaacggggaagaa 
gttccaagcgctaacgacaatgacattacagaattagaagagcgtatcaaccagtttgta 
acttcatctcaagaaaaaggtagagacgcaacgctagacaaaacaatgcgttggttaggt 
attcaaaacattaacaaagttactaaaaaagatatagcaaatgcacatcaaaaactagat 
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gcaggactaaaacaattagataaggagaattcaaatgttaaatag 
Sequence 494 

VSVTVKGQTETEWLPVLDFRNKSLAKGSATTFDINKAQKRC FVKAAALHGLGLYIYNGEE 
5 VPSANDNDITELEERINQFVTSSQEKGRDATLDKTMRWLGIQNINKVTKKDIANAHQKLD 
AGLKQLDKENSNVK* 

Sequence 4 95 

Contig_04 82_pos_2104_14 30, 

10 putative peptide of unknown function 

atggtagtaataaaaaactacattacagaagatgacggtacaacaactgtagtcatcaaa 
ggagtagaactagataacaaaacatctttacttttagacaacggttacgaagtagaagca 
gatgtaagagttgtagatccattcaagattacagataagcagcgtagaaaagtatttgct 
ctctgtaacgacatagaagcttacacaggacaaccacgcgactatatgaggtatttgttc 

15 atggattacgtagaagttctctatggctatgaaaaacgtctctcattgagtgattgcaca 
agagaacaagctaaacaagttatagaagttattcttgactgggtgtttcacaacaatata 
ccacttaattataagacaagtgacttactcaaaaatgataaagcgttcctttactggtca 
acagtcaatcgtaactgtgtaatatgcggaacgccacgagcagaacttgcgcattatcac 
acagtaggtcgaggacgtaacagacgaaagatagatcacacagacaacaaagtattagcg 

20 ctatgttcaagacatcataaagagcagcaccaaataggtatagatagttttaatgagaaa 
tacaaattacatgaaagttgggtgtccgtagatgaacgactcaaccgaatgttgaaagga 
gaagtaaatggctga 

Sequence 4 96 

25 MVVIKNYITEDDGTTTWIKGVELDNKTSLLLDNGYEVEADVRWDPFKITDKQRRKVFA 
LCNDIEAYTGQPRDYMRYLFMDYVEVLYGYEKRLSLSDCTREQAKQVIEVILDWVFHNNI 
PLNYKTSDLLKNDKAFLYWSTVNRNCVICGTPRAELAHYHTVGRGRNRRKIDHTDNKVLA 
LCSRHHKEQHQIGIDSFNEKYKLHESWVSVDERLNRMLKGEVNG* 

30 Sequence 4 97 

Contig_04 82_pos_14 01_64 3, 

putative peptide of unknown function 

atgttcgatgatagcaaaatcaagtatatagaagcactgccagaacgagatacaatcatc 
actttatgggttaagttgctgacattagctggaaagtataacgaacaaggatacattatg 

35 ttatccgaaagtctaccctataacgaagaaatgttagctaacgaatttaatagacctatc 
aattcaataagattagcgttacaaacattcgaaaagctaagcatgattgaagaagtgaat 
ggtgtctttaaagtatctaattgggaaaaacatcagaacatcgaaggtttagaaaagata 
agagaacaaaaccgtttgcgtaaacaaaagcaaagaaaaaaacaaaaacttttagatagt 
cacgtgaagtcacgtgacagtcacgcaacagatatagaagaagataaagaagtagaagaa 

40 gaaagagaaaaagaagtagataaagatatcttcaaaaactcaattaattacatcatgagt 
aaccttactcataatttaactcctaaccaaatggaacagataggatatgccattgatgat 
attggacaacatgcagatgaagttgttgaagtagctactgattatacaaaagacaaaggt 
tgtcatgcaggttacctaatcaaagtgttaaacaactgggctaaagagaacgttaagaat 
aaaaaagaggctgaaaataaaattaaacctaaaaataaaaaaactgtaacagatgatgta 

45 attgctcaaatggagaaagagctaggagatgaaagttaa 



Sequence 4 98 

50 MFDDSKIKYIEALPERDTIITLWVKLLTLAGKYNEQGYIMLSESLPYNEEMLANEFNRPI 
NSIRLALQT FEKLSMI EEVNG VFKVSNWEKHQN I EGLEKI REQNRLRKQKQRKKQKLLDS 
HVKSRDSHATDIEEDKEVEEEREKEVDKDIFKNSINYIMSNLTHNLTPNQMEQIGYAIDD 
IGQHADEVVEVATDYTKDKGCHAGYLIKVLNNWAKENVKNKKEAENKIKPKNKKTVTDDV 
IAQMEKELGDES* 

55 

Sequence 499 

Cont i g_0 4 8 2_pos_637_2 8 4 , 

putative peptide of unknown function 

atgactaaacaacaagccctagaagtaattaagacaattagacatgtatacaacattgac 
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tttgacagacctaaattagaaacatgggttaacattttgagccaaaatggggattatgaa 
ccgactaaaaaaacagtaatgcaatatatcaatgatgctaatccttatccacctagtatt 
ccaaacataatgagaaaagaagtcaaagtcgtaaaagaagagcctgtcgacgaaaaaact 
gctagacatcgttggagaatgaaaaatgatccagaatacgtagcacaacgtaaaaagata 
5 ttagacgacttcagaaagaagttaagtgagtttggagtgagtgacgatgaatga 

Sequence 500 

MTKQQALEVIKTIRHVYNIDFDRPKLETWVNILSQNGDYEPTKKTVMQYINDANPYPPSI 
PNIMRKEVKVVKEEPVDEKTARHRWRMKNDPEYVAQRKKILDDFRKKLSEFGVSDDE* 

10 

Sequence 501 

Contig_04 83_pos_6911_7564, 

is similar to (with p-value 1.0e-41) 

>gp: gp | U93874 I BSU93874_2 Bacillus subtilis cysteine synthase 
15 (yrhA) , cystathionine gamma-lyase (yrhB), YrhC (yrhC) , YrhD 
{yrhD), formate dehydrogenase chain A (yrhE) , YrhF (yrhF) , 
formate dehydrogenase (yrhG), YrhH (yrhH), regulatory protei 
n (yrhl), cytochrome P450 102 (yrhJ) , YrhK (yrhK) , hypotheti 
cal protein YrhL (yrhL) , putative anti-SigV* factor (yrhM) , R 
20 NA polymerase sigma factor SigV (sigV) and YrhO (yrhO) genes 
, complete cds, and YrhP (yrhP) gene, partial cds . NID; gl93 
4604. >gp:gp| Z99117 |BSUB0014_205 Bacillus subtilis complete 
genome (section 14 of 21): from 2599451 to 2812870. NID: g26 
34966. 

25 atgacacctttgggtcaatcacctttagcactaggcgctgacatagttattcatagtgca 
actaaatttctaggtggacatagcgatttaattgcaggtgcagcaattactaataataga 
gaggttgcaaatgcattgtacttattacagaacggcacgggcacagccctttctgcatat 
gatagttgggcacttgcaaaacatcttaaaacattaccagttcgttttaaacaatctgtt 
cataatgctgaacgccttgttcaatttttgagtcaaagagaggaga tttctgaggtgtat 

30 tacccgggaaataatcttacacatctcaagcaagcttcaactggaggtgcagtgataggt 
ttccgacttaaagatgaatctaaagcacaaaagttcgtcgattctcttactttaccactt 
gtatcagtgagtctcggtggtgtagaaactatcctatcacatcccgcaacaatgtctcat 
gcagcagtgccagaagatgtgagacgtgaacgtggcatcactttcgggttattccgttta 
agtgtaggtcttgagaattcagaagaactcatcgcagattttaactacgctttaaaggag 

35 gctttcaatgagtcatttactgaaccaattaaagagcaacgttttagtagctga 

Sequence 502 

MTPLGQSPLALGADIVIHSATKFLGGHSDLIAGAAITNNREVANALYLLQNGTGTALSAY 
DSWALAKHLKTLPVRFKQSVHNAERLVQFLSQREEISEVYYPGNNLTHLKQASTGGAVIG 
40 FRLKDESKAQKFVDSLTLPLVSVSLGGVETILSHPATMSHAAVPEDVRRERGITFGLFRL 
SVGLENSEELIADFNYALKEAFNESFTEPIKEQRFSS* 

Sequence 503 
Contig_04 83_pos_14 208_0, 
45 is similar to (with p-value 0.0e+00) 

>gp:gp|AB015981 |AB015981_5 Staphylococcus aureus genes for O 
rfA, MnhA f MnhB, MnhC, MnhD, MnhE, MnhF and MnhG, complete c 
ds. NID: g4001723. 

atgtttatgttgattggtattattggctcatttacaacaggagatattttcaacttgttt 
50 gtgttctttgaagtctttttaatgtcttcatattgtttactcgttattggtactactaaa 
atacaattacaagaaacaattaagtatattttagtcaatgttgtttcatcgtctttcttt 
gtcatgggtgttgcagttttatattcagttgtaggaactttaaatctcgctcatattagt 
gaaagattgtcacaactttctgtacatgacagtggcttagtcaatattgtttttatttta 
tttatctttgtctttgccactaaagcaggcgtttttcctatgtacgtatggctacctggt 
55 gcttattatgcccctccagtagcgatcatcacgttctttggtgcactattgactaaagtg 
ggtgtatacgcaattgcgagaactctaagtttattctttaataatacagtaagcttttct 
cattatgtcatccttttcttagcattacttacaattatttttggatgtataggtgcgata 
gcttactatgatacgaagaaaatcatcctttacaatattatgattgcagtaggtgtcata 
ttagttggtattgctatgatgaacgaatcaggcatgactggtgcaatatattacacacta 
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catgatatgttagttaaagcttcattgttcttactcattggcgtcatgtacaaaatcact 
aaaacgactgacttacgtcattttggtggcttgataaaagggtatcctattctaggttgg 
acattctttattgcagcgctaagcttagcgggtataccaccttttagtggtttctacggt 
aaattctatattgttcgagcgacctttgaaaaaggattttatctaagtggtatcattgta 
5 cttttatcaagtttaatcgtgttatattcagtcatacgtattttcttaaaaggatttttc 
ggtgaagttgaaggatatactttatctaaaaaggtaaatgttaaatatctaacaactatc 
gctgttgcatctacagt 

Sequence 504 

10 MFMLIGI IGSFTTGDI FNLFVFFEVFLMSSYCLLVIGTTKIQLQETIKYILVNVVSSSFF 
VMGVAVLYSWGTLNLAHISERLSQLSVHDSGLVNI VFILFI FVFATKAGVFPMYVWLPG 
AYYAPPVAI ITFFGALLTKVGVYAIARTLSLFFNNTVSFSHYVILFLALLTI IFGCIGAI 
AYYDTKKIILYNIMIAVGVILVGIAMMNESGMTGAI YYTLHDMLVKASLFLLIGVMYKIT 
KTTDLRHFGGLIKGYPILGWTFFIAALSLAGIPPFSGFYGKFYIVRATFEKGFYLSGIIV 

15 LLSSLIVLYSVIRIFLKGFFGEVEGYTLSKKVNVKYLTTIAVASTV 

Sequence 505 

Contig_04 83_pos_13585_12029, 
putative peptide of unknown function 

20 atgaaacgatttataccagcttggtatagccgtaacagatggtgggaaagtacctcaaga 
ccattctatctaaaaaaacagtatacagattttgacgatatgattagtttaatgacaatg 
catagttcgaataatgtggattatcaattgatagttttaaattttagtccatatcttaga 
acattcctccatcgatatgatttgtatgaaagtcattattggtctgtatttgatgagata 
cagggcgttggacatcaaacgcctcaagctattgattatcgcgatctttcatggccagaa 

25 ggcactgaatttatttttactccctttcaaattcaagcgattacaggtgataacacgttt 
tctaaaattcact tcagccaagaggggtacctgatgtgggtagaggattacaagtatagt 
acaattcaaagacgatttgtattcgatgacagaggatttatatcggcagtgcgtacttat 
acacctgatggtgataacaataaaaaacactatttttcaaaagatggggaagaaatattt 
gttgaagacttaaatgttaatacagtaacgattaataaaaatttccaatcaaaatttaaa 

30 agggttacgtattcatctatggctgagttgatagaagagaaattccaatcatatgtagaa 
agagaattgaatgaagatgattctgttatagtggcatctgatgaacgtcataattcaatg 
atggcacgcactattgatgcatcgtctttatgtttttctatttttactgagagaaataaa 
gtggtgacacaagatttatatgactctatttctagagcatattattgtctcgttgataca 
caagctaatcaaaatatgattgaacactacgcaggattgaacatgaatgatattaatctt 

35 ttaagggtaacgccttttgatgcgaagtcattacctaaccaaagtagtcaattgtatgac 
acttatattggattatggatagatggtttggacgagattgaaatacgagagattgtaaac 
agcttatttcaatatattcaacataaagatggctataagttgaaaattttaactaagagt 
agagataatcttacggaaaatcttatagatgaagttgctcatctcaatgatttatatcac 
caagagaaaaaggaaataagtgatgtaattgaagacgtgatacagaataaaaaagaaaca 

40 atcattgatattgaaacagtaccgtttgaagaagatcttgtaagcgttatttcaaaatta 
agagttgtagtagatttatctttagagccgaaactttttttacaaatctgttgtattggc 
gcgggtataccacaaattaataaaaagagaacagattatgttaaacatatgcataatgga 
tatattattgatgacatatcgcaaactgtagaatctttagattattttttggcacattta 
aaaaattggaattattcttatgcatattccatgagattaacggatgattttagttcaatt 

45 aatattattcatcaaattaatcagttatttaaaggtgatgtttcaagtggcacgtaa 

Sequence 506 

MKRFIPAWYSRNRWWESTSRPFYLKKQYTDFDDMISLMTMHSSNNVDYQLIVLNFSPYLR 
TFLHRYDLYESHYWSVFDEIQGVGHQTPQAIDYRDLSWPEGTEFIFTPFQIQAITGDNTF 

50 SKIHFSQEGYLMWVEDYKYSTIQRRFVFDDRGFISAVRTYTPDGDNNKKHYFSKDGEEIF 
VEDLNVNTVTINKNFQSKFKRVTYSSMAELIEEKFQSYVERELNEDDSVIVASDERHNSM 
MARTIDASSLCFSIFTERNKVVTQDLYDSISRAYYCLVDTQANQNMIEHYAGLNMNDINL 
LRVTPFDAKSLPNQSSQLYDTYIGLWIDGLDEIEIREIVNSLFQYIQHKDGYKLKILTKS 
RDNLTENLIDEVAHLNDLYHQEKKEISDVIEDVIQNKKETIIDIETVPFEEDLVSVISKL 

55 RVVVDLSLEPKLFLQICCIGAGIPQINKKRTDYVKHMHNGYIIDDISQTVESLDYFLAHL 
KNWNYSYAYSMRLTDDFSSINIIHQINQLFKGDVSSGT* 

Sequence 507 

Con t ig_0 4 8 3_pos_l 17 9 6_10 4 8 0 , 
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putative peptide of unknown function 

gtgattgacaatgagtattgggataatcaataccaacaagataagacaatacaacgtaat 
tttataaaaccactcatttatgaaaatgaagaacaattacaacaaaaactagaggcagtt 
acatttcctgggcaatatggagataaagttaaacctattcattgtcgcgttagtattcat 
5 tttgatggttcttatcaatttaatggaaatgagtctattgaagtatcaggacgatttggg 
gaatcataccaacccctcattacatggagtcaaaatatcattgctgatgccaataaggtg 
aatcaaatatggccagaatttaaagttgaaggtgatgctaaaatccaatatacattgaga 
ttgacgcctgtttattcaactgatcaaccagtagaaaagctaatatatgaacaagacgat 
ttagacactcccatagaactacctgctcgtccttatcaaacatatgtgagtgtatcaatc 

10 aaagctaaaggtaaaggaacattatttataggtgctattcataaacgttggtcacgcttg 
gaattagggcagttcatattaggcggaaaacgatatagtgatgaaaataagcaagaattt 
atacattacttccatcctggagatttaaaaccaccactcaatgtatattttagtgggtat 
cgtactgctgagggctttgaagggtactttatgatgaaacgtatgaatgctccatttatt 
ttaatagctgatcctagaatcgaaggtggtgccttttacctagggtcagagaattatgaa 

15 caggcaatccgtaaggtcatccaaaatgctttggattatttgggatttgcgaacaaccaa 
ttaattctttctggattatcaatgggatcatttggcgcactttattacgctacaaaatta 
aatccagcggctgttattgtaggaaaacctttgataaatctcggtactattgctaataat 
atgaaactcgttcgtccaaacgattttggaacgtcacttgatattttgcgattgaatcaa 
aatggcataactaacaaagatgttgttcagttagataatcatttttggaagcaaattcag 

20 catagtgatttgtcaatgaccacatttgcgattgcttacatggagcatgatgattatgac 
aaatatgcatttcaagatttattgcctgttcttacaaaacaacatgcacgtgtgataagt 
aaaagaattcctggtagacataatgatgattctgctactgttactcattggtttattaat 
ttttataatttaatcatggaagagcgatttgggagggtaacacatgcaagaagatag 

25 Sequence 508 

VIDNEYWDNQYQQDKTIQRNFIKPLI YENEEQLQQKLEAVTFPGQYGDKVKPIHCRVSIH 
FDGSYQFNGNESIEVSGRFGESYQPLITWSQNIIADANKVNQIWPEFKVEGDAKIQYTLR 
LTPVYSTDQPVEKLIYEQDDLDTPIELPARPYQTYVSVSIKAKGKGTLFIGAIHKRWSRL 
ELGQFILGGKRYSDENKQEFIHYFHPGDLKPPLNVYFSGYRTAEGFEGYFMMKRMNAPFI 

30 LI ADPRI EGGAF YLGS EN YEQAI RKVIQNALDYLG FANNQLI LSGLSMGS FGALY YAT KL 
NPAAVIVGKPLINLGTIANNMKLVRPNDFGTSLDILRLNQNGITNKDVVQLDNHFWKQIQ 
HSDLSMTTFAIAYMEHDDYDKYAFQDLLPVLTKQHARVISKRIPGRHNDDSATVTHWFIN 
FYNLIMEERFGRVTHARR+ 

35 Sequence 509 

Contig_04 83_pos_104 36_9921, 
putative peptide of unknown function 

atgtatggtacaaaattacgttttaatcaagataatatctattttgagaaccctttgatg 
ccatccggtacaatcattcacagttggtatatgttaactgattttgcagaagaccgtgta 

40 agccctaagctacctattttaaaaaaagggcgccaatatcaatttcaatttaattttgaa 
gttgaacctgagggtgcggcttattttaaaatgaaattttatcgtaagaataaagaaatt 
cttagtcatcaaattctaaaaaataaaaaagaaaatattgtctatcctagagaagcatat 
tcatatgaattagaacttattaatgctggcatgaatcatctatcttttcacaatataatt 
gtgcaagaattaagagaagatagtaatcaagcttatgaggcaacgcaatatatagatcct 

45 aagaaaaaacttaaagtaattaatcaaataataaccaatataaggacacatcatctagac 
tcatcaaactatcacaggagtgatatgaatggctaa 

Sequence 510 

MYGTKLRFNQDNI YFENPLMPSGTIIHSWYMLTDFAEDRVSPKLPILKKGRQYQFQFNFE 
50 VEPEG7\AYFKMKFYRKNKEILSHQILKNKKENIVY PREPAYS YELELINAGMNHLSFHNI I 
VQELREDSNQAYEATQYIDPKKKLKVINQIITNIRTHHLDSSNYHRSDMNG* 

Sequence 511 

Contig_0483_pos_9697_9095, 
55 is similar to (with p-value 4.0e-42) 

>gp:gp|X62035 |BSSECA2_1 B.subtilis secA gene (partial). NID: 
g48979. 

atgtatccaaaagatgtgcagattttaggagcaatcgctatgcatcaggggaatattgca 
gaaatgcaaacaggagaaggtaagacgcttacagctaccatgcctctgtacttaaatgca 
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cttacaggtaaaggtgcttatctaatcacaacaaatgattacttagcaaaacgcgatttt 
ttagaaatgaaaccactatatgaatggctaggcttgtctgtatcattaggatttgtggac 
attccagaatatgaatacgctgaaaatgaaaaatatgaactgtaccaccatgacattgtt 
tacacgactaatgggcgactagggtttgattatttaattgataatttagctgatgatatt 
5 cgtgccaaatttttaccgaaattaaactttgctattattgatgaagtcgattctattata 
ttagacgctgcccaaacgcctttagttatttctggtgcaccacgtgtacaatctaattta 
tttcataaacttaattctttagtcttttctttatcatctttaacagctttgaattggtta 
agtacatcgtctgacattttgatgccaggcacttcattatgtaagaaaagtgcgttgtta 
taa 

10 

Sequence 512 

MYPKDVQILGAIAMHQGNIAEMQTGEGKTLTATMPLYLNALTGKGAYLITTNDYLAKRDF 
LEMKPLYEWLGLSVSLG FVDIPEYEYAENEKYELYHHDIVYTTNGRLGFDYLIDNLADDI 
RAKFLPKLNFAI IDEVDSIILDAAQTPLVISGAPRVQSNLFHKLNSLVFSLSSLTALNWL 
15 STSSDILMPGTSLCKKSALL* 

Sequence 513 

Cont i g_0 4 8 3_pos_4 5 30_3 7 1 8 , 
is similar to (with p-value 3.0e-64} 

>sp:sp|P264 97|SP0J_BACSU STAGE 0 SPORULATION PROTEIN J. >pir 
:pir|S18081 I A38536 spoOJ93 protein - Bacillus subtilis >gp:g 
p| D26185|BAC180K_54 B. subtilis DNA, 180 kilobase region of 
replication origin. NID: g467326. >gp:gp| X62539 I BSORIGS^ll B 
.subtilis genes rpmH, rnpA, 50kd, gidA and gidB. NID: g40020 
. >gp:gp| Z99124 1 BSUB0021_201 Bacillus subtilis complete geno 
me (section 21 of 21): from 3999281 to 4214814. NID: g263644 
2. 

atgaataatgatgatagtgtgcaatttattgcactagaattaattagacctaatccttat 
cagccacgtaagacgtttgaagaagaacgactcaatgatttagcttcatcaattcaacaa 
catggtatattacagcctattgtattacgtcaaactgttcaaggttactatattgttgtg 
ggtgagcgacgatttagagcatctcagttggcgggattaacagaagtgccagctattatt 
aaagaactatctgatgaagatatgatggaattggcaattattgaaaatttacagagagaa 
gatttaaatgccattgaagaagcagaaagttataaaaaaatgatgacagatttgaatatt 
acacaacaagaggttgcgagacgattaggtaagtcacgtccttatattgccaatatgctt 
aggttattacagttacctaaaaatgttgctcaaatggttcaacaaggagcgttatcaagt 
gctcatgggcgtacgttattaactttgaaagacgccagtaaaataaaaaagacggcaaaa 
caagccactcaggagtcttggagtgtaaggtatttagaggagtacgtcaatggtttagtc 
agtaaagacatctcaatgaaactggacagagagaccaagggaagtaaaccgaaaatgatt 
caacagcaggaaagatttttaaaaaagcaatatggtgcgaaagtagatatttcgacatct 
aaaaatgtcgggaaaatcacgtttgaatttaaatctgaagcagaattcaaacgcttgatt 
cgtcaacttaataaagattataaggaatattaa 

Sequence 514 

MNNDDSVQFIALELIRPNPYQPRKTFEEERLNDLASSIQQHGILQPIVLRQTVQGYYIVV 
45 GERRFRASQLAGLTEVPAI IKELSDEDMMELAIIENLQREDLNAIEEAESYKKMMTDLNI 
TQQEVARRLGKSRPYIANMLRLLQLPKNVAQMVQQGALSSAHGRTLLTLKDASKIKKTAK 
QATQESWSVRYLEEYVNGLVSKDISMKLDRETKGSKPKMIQQQERFLKKQYGAKVDISTS 
KNVGKITFEFKSEAEFKRLI RQLNKDYKEY* 

50 Sequence 515 

Con t ig_04 8 3_pos_3 4 7 6_2 688, 

is similar to (with p-value 4.0e-20) 

>gp:gp|AJ222587 | BS1 6829KB_25 Bacillus subtilis 29kB DNA frag 
ment from ykwC gene to csel5 gene. NID: g2632216. >gp:gp|Z99 
55 111 | BSCJB0008_93 Bacillus subtilis complete genome (section 8 
of 21): from 1394791 to 1603020. NID: g2633699. 
atgattttaatttatatccttgtagcgattgttgttatagcaattttgaataagattatt 
gagcaagcatt taaaattcaaaataaaagcaaaaaaggaaataaaaaacgttcaaaaaca 
ctgatttctcttgtacaaaacatagtaaaatatatcgtatggtttgttgttatcacaaca 
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attttaagtaagtttggtattagcgtcgaaggtatcatcgctagtgctggagttgtaggt 
attgcagttggtttcggtgcgcaaacaatagtaaaagatattattacaggtttctttatt 
atctttgaaaatcaatttgatgtgggtgactatgttaaaatcaatagttcaggaactacg 
gtagcagaaggtactgtgaaatctattggtttaagatcaacgcgaattaatacaatttcg 
5 ggagaactgactattttacctaatggtagcatgggggaaattacgaacttttcaattaca 
aatgggactgctattgtagaactaccagtatcagttgatgaaaatatagatcaagttgaa 
aagaaactcaatcgtttatttgtttctttacgtagtaaatattacttatttgtcagcgat 
ccagttgttgatggcattgatgcgatagaatctaataaggttactatacgaatttcagcg 
gaaacaattcctggtgaaggattttcaggcgctcgtattattcgtaaggaagctcaaaaa 
10 atgtttagacaagaaggtattcgcatgccacaaccagtcatttcaaattataatgaagaa 
aaaagctaa 

Sequence 516 

MILIYILVAIVVIAILNKIIEQAFKIQNKSKKGNKKRSKTLISLVQNIVKYIVWFVVITT 
15 ILSKFGISVEGI IASAGVVGIAVGFGAQTIVKDI ITGFFI I FENQFDVGDYVKINSSGTT 
VAEGTVKSIGLRSTRINTISGELTILPNGSMGEITNFSITNGTAI VELPVSVDENIDQVE 
KKLNRLFVSLRSKYYLFVSDPVVDGIDAIESNKVTIRISAETIPGEGFSGARIIRF<EAQK 
MFRQEGIRMPQPVISNYNEEKS* 

20 Sequence 517 

Cont ig_0 4 8 3__pos_2 28 4_1 54 7 , 

is similar to (with p-value 2.0e-82) 

>sp: sp| P37518 I YYAF_BACSU HYPOTHETICAL 40-1 KD GTP-BINDING PR 
OTEIN IN RPSF-SPO0J INTERGENIC REGION. >gp : gp | D26185 I BAC180K 
25 _50 B. subtilis DNA, 180 kilobase region of replication orig 
in. NID: g467326. >gp : gp | Z99124 | BSUB0021_1 97 Bacillus subtil 
is complete genome (section 21 of 21): from 3999281 to 42148 
14. NID : g2636442. 

atggttcaacctaaaaaaacaattcctacaacttttgagtttactgatattgcaggtatt 
30 gttaaaggtgcatctaagggcgaaggtttaggaaataaattcctttcacatattcgtgaa 
gtagatgctatatgtcaggtggttcgtgcgtttgacgatgagaatgtaacacatgtatca 
gggcgtgttaatccgcttgatgacatagaagtcattaatatggaacttgttttagcagat 
ttagaatctgttgaaaaacgtttaccgaaaatagagaagatggctcgtcaaaaagataaa 
acagctgagatggaattacgtatattaacacaaattaaagaagcgttagaagacggtaaa 
35 ccagtacgcagtattgatttcaatgaggatgatcaaaagtgggttaatcaagctcagtta 
ttaacatctaagaaaatgttatacattgctaatgttggtgaagatgaaattggagataaa 
gataatgataaagtgaaagcaattcgtgaatatgcagcaaacgaagattcagaagttatc 
gttattagtgcaaaaatcgaggaagaaatcgctacattagatgatgaagataaagaaatg 
• ttcttagaagatttaggcatcgaagaaccaggtttagacagactcattagaacaacatat 
40 gatttgatcataaaaaggctttggcgtttacaccagaggataggggaaaagttaaggata 
agtttggtgtctatgtga 

Sequence 518 

MVQPKKTIPTTFEFTDIAGIVKGASKGEGLGNKFLSHIREVDAICQVVRAFDDENVTHVS 
45 GRVNPLDDIEVINMELVLADLESVEKRLPKIEKMARQKDKTAEMELRILTQIKEALEDGK 
PVRSIDFNEDDQKWVNQAQLLTSKKMLYIANVGEDEIGDKDNDKVKAIREYAANEDSEVI 
VISAKIEEEIATLDDEDKEMFLEDLGIEEPGLDRLIRTTYDLIIKRLWRLHQRIGEKLRI 
SLVSM* 

50 Sequence 519 

Cont ig__0 4 8 4_pos_4 4 05_3587 , 

is similar to (with p-vaiue 3.0e-49) 

>gp;gp| AB001896 | AB001896_2 Staphylococcus aureus DNA for sig 
ma70 operon,* complete cds . NID: gl943991. 
55 gtgaaggataataatgaagtattaaagttatttatagtttcagattcaattggagaaaca 
gcgcaacggatgattcatgcgacgctgacacagtttccagatttaactcaagtagaaatt 
aagaaatttccatatattaaggacgaacaagaatttttaaatgtcttacaattagctaaa 
gaacagaatgcaattgttgcaacaacattagtgagtgagtcatttaatgcattaggtcat 
cagtttgcaaatgaacatcaaattccctatgtagattacatgtctgagttaattagcata 
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attaaacaacatacacacgctaaaccattaatggaaagtggtgcgttgcgtaagcttaat 
gatgagtattttaagcgtatagaagcaattgagtattcagtgaaatatgatgatggtaag 
cattttacagatattggagaagcggatgctttaatagtaggtgtatcacgtacctctaaa 
acgccattaagtatgtacttagctaataaaggatataagattgcaaatattcctttagtc 
5 cctgaagtggctattccagataatgtatttcaacaaaagaatttaaaggtatttggatta 
acagcaagtcccaattatatcgcaaatatacgacgtaatcgtgcagaaacattagggcta 
tcttcagaatctaattacaatagtttagagcgtatcaaaaaagaattatcttatgctgaa 
gaagtttttagaaaattaaatgcaacggtaattaatacagaatataaatcgatagaggaa 
tcggcattttatattgaaaagtttttagctaaacgttaa 

10 

Sequence 520 

VKDNNEVLKLFI VS DS IGETAQRMI HATLTQFPDLTQVE I KKFP Y I KDEQEFLN VLQLAK 
EQNAIVATTLVSESFNALGHQFANEHQIPYVDYMSELISIIKQHTHAKPLMESGALRKLN 
DEY FKRI EAI EYSVKYDDGKH FTDIGEADALI VGVSRTSKT PLSMYLANKG YKIANI PLV 
15 PEVAIPDNVFQQKNLKVFGLTASPNYIANIRRNRAETLGLSSESNYNSLERIKKELSYAE 
EVFRKLNATVINTEYKSIEESAFYIEKFLAKR* 

Sequence 521 

Cont ig_0 4 8 4_pos_l 4 8 4_5 61 , 

20 putative peptide of unknown function 

atgaaaaaattttggggaattttattaattgtgatgtcaattgctcttgtgggatgttcg 
aatagcaatgattcagatcaatcttctaatgaaaagtcatcatcaaaaagttcggagaaa 
aaaacggatgtggcgactgaatatacaaaagagaacgaatataaagaactagaaaaagaa 
gctaaggatcttaaacaaaagccagttcttaatgaaatcgatgcacttattacagaaaaa 

25 ggttttacaaacaaaacgggattgcaaggctgggaagactataaaaaattagtggataag 
gtaacacttgcagattataaatacacaaaagaatctaaagggtcatctatagaagaagtt 
aataagttctttaaagataaaaaaggtgtagagattaaacgaatgaaaagtaaggaaaaa 
aatattaagcatatcaattatatgtatgtagatccagatggtaaaaaagcaggtaaagat 
aagcaacctatgtcctacgctcaaatacttgcaacatttaaagaaggtaaattagtagct 

30 acaaatattcaacctggattttttgctttagacaaaaagaaaatggttaaagctaaagac 
ttagaaaaagttaagacattggaagatttaacgcgtttgaaagatcctaaagcgacatca 
tatggtattttacagacgaaatataaagggaaaccatacactcaagtttcaatattaggc 
agtgattctgatgaagagaatgatatttcctcagccatcttagcttattatctattttca 
ccaacggaattagatagtgacgataatcataaatacgttgaagttgcatcagcgccattc 

35 ttaagtgctcaaaacgatttttcatcttatcaactaggcgtatttaaaaaaattatcgaa 
agtagtatgtcgttcgatgaataa 

Sequence 522 

MKKFWGILLIVMSIALVGCSNSNDSDQSSNEKSSSKSSEKKTDVATEYTKENEYKELEKE 
40 AKDLKQKPVLNEIDALITEKGFTNKTGLQGWEDYKKLVDKVTLADYKYTKESKGSSIEEV 
NKFFKDKKGVEIKRMKSKEKNIKHINYMYVDPDGKKAGKDKQPMSYAQILATFKEGKLVA 
TNIQPGFFALDKKKMVKAKDLEKVKTLEDLTRLKDPKATSYGILQTKYKGKPYTQVSILG 
SDSDEENDISSAILAYYLFSPTELDSDDNHKYVEVASAPFLSAQNDFSSYQLGVFKKIIE 
SSMSFDE* 

Sequence 523 
Cont ig_0 4 8 6_pos_2 5 5_8 8 7 , 
1 is similar to (with p-value 9.0e-20) 
>sp:sp| P33642I YFIT_PSEAE HYPOTHETICAL 39.5 KD OXIDOREDUCTASE 
50 IN FIMT 3 * REGION ( DADA* ) (ORFZ) . >gp : gp I L4 8 934 I PSEPILRV_2 P 
seudomonas aeruginosa (isolate pRIC351) pilR gene, 3* end of 
cds, dada*, fimT, fimU and pilV genes, complete cds . NID: g 
1161217. 

atgaagttaagagatattaagcgttatgagtctacagaggtcacttcaatagaacggcat 
55 aatggctattattcagtgaaaaccgatcaatcttcaacaattgaagcgcacaaaattatc 
gttgcaggtggcgcatggtcttcgcaattattaacacaatatcatctacaacgacaagtg 
attggcgttaaaggtgaagttatcttattagaaaataacgatctttcacttactgagaca 
ttatttatgactaatggttgttacatcgttccaaaacaacccaatcgttttttaattggt 
gcgacgagtgaatttaataattattctgtcggtactacagatgaaggtatggattggctt 
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cttcgccatgcatatcatcgtgtacctcaactaaaagacagtcatatactgaagaaatgg 
tcaggagtaagaccatacacagaaaaagaaatgccagtcatggatcaaattgatgatggc 
ttatacgtgataagtggtcattatcgaaacggaatattattgtcacctattatcggtcgt 
gacattgccaattggctactttctggtattaaaccatcacgttattcaagttttacagtt 
5 acaaggaggaataatcatgaagtgtatcattaa 

Sequence 524 

MKLRDIKRYESTEVTSIERHNGYYSVKTDQSSTIEAHKIIVAGGAWSSQLLTQYHLQRQV 
IGVKGEVILLENNDLSLTETLFMTNGCYIVPKQPNRFLIGATSEFNNYSVGTTDEGMDWL 
10 LRHAYHRVPQLKDSHILKKWSGVRPYTEKEMPVMDQIDDGLYVISGHYRNGILLSPIIGR 
DIANWLLSGIKPSRYSSFTVTRRNNHEVYH* 

Sequence 525 

Contig__04 86_pos_208 6_2 4 96, 

15 putative peptide of unknown function 

atggggactacacaagaattaccagtaaaaaccaaaagtttaaataagaaaaccattgag 
caaaaagtttttct tat tcgt aatgataatggtcaatatttacttgaaaagcgtaaagaa 
aaacttcttaatggtatgtggcaatttccaatgagagaacaaacaaatgcaaacgatgtg 
atatctgatgatttaggaaaaagtatcgaaacaattaacgaaccagtatttaaattaaag 

20 catcaatttacccatcttacatgggaaattaaagtatacaatgttacagcacctcttaat 
ataaaggaaaatgatttacctaaacaaatgacgtggtttaatttagatgatagggagcag 
tatatatttcccgtaccaatggataaaatatataagtttattgaaggttaa 

Sequence 526 

25 MGTTQELPVKTKSLNKKTIEQKVFLIRNDNGQYLLEKRKEKLLNGMWQFPMREQTNANDV 
ISDDLGKSIETINEPVFKLKHQFTHLTWEIKVYNVTAPLNIKENDLPKQMTWFNLDDREQ 
YIFPVPMDKIYKFIEG* 

Sequence 527 
30 Contig_04 87_pos_6312_5 665, 

putative peptide of unknown function 

gtgactaagacagacttatctcatttgcacaacattacaggcattcctctcaatacattg 
tggtaccaaaaggaacgtggcacatataacgataaattgaagtgcttctttacggacaca 
atgccgagagtgaataagaaacaagagtttaacgaaagagttgtagcaaaagatgaaatt 

35 tggaagtacagcgagaagtatgacttatacgtaagcaacttaggcagaatgaaaagacct 
gatggaaaatacaagtttgcgaatggttgtaagggtattttcacagt tatttataagaat 
aagaagtatcgtgcagcagatattgtgtatgaaacgtttatcggtaacttgaaaaacgga 
ttgcacgcatatccgaaagatagtagatacaacaactttatttcagataacttattccaa 
tctacattacagaaatatagattgtatcgcagaaataaaggtgtatccaaaccagtatac 

40 ctagtggatagcgacaacaaaattgtagaagaattcgcaagtacagtagaagctggaaaa 
gtattattcatcgacagacgcaacattgctagaaagtgcaaccgtagatatgtgagtgac 
gggttgatgtacatgtgggctgatgagtacgagaaggtaaatgcatga 

Sequence 528 

45 VTKT DLSHLHNITGIPLNTLWYQKERGTYNDKLKCFFTDTMPRVNKKQEFNERVVAKDEI 
WKYSEKYDLYVSNLGRMKRPDGKYKFANGCKGIFTVI YKNKKYRAADIVYETFIGNLKNG 
LHAYPKDSRYNNFISDNLFQSTLQKYRLYRRNKGVSKPVYLVDSDNKIVEEFASTVEAGK 
VLFI DRRNIARKCNRRYVSDGLMYMWADEYEKVNA* 

50 Sequence 529 

Contig_04 87_pos_4 865_4 4 52, 

putative peptide of unknown function 

atgtggaacgtagaaacgatttatatcgaagatgaatgggttaaagttaatgacggttcg 
atatacggaattacaaaggatttagttagagattatgtattaatgcaatcaacaggctta 
55 aaagataagaacggtgtagagatatacgagggggacatcatcgaatttgaggatgaatct 
ttttgttatccattcgatgatgaagctatagttgaaacaataaatagagcacaggtaatt 
atagataaggttaaaggtatttttttggaaaactttatggtaaaggacagtacgattgct 
aaagaatataaatattattatgatttgccaacatctgaaaaaacaatatttt ttaaagaa 
tgtagtgttgtaggtaatgtgtttgaagatgaaaatttactggaggacgagtaa 
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Sequence 530 

MWNVETI YIEDEWVKVNDGSI YGITKDLVRDYVLMQSTGLKDKNGVEIYEGDIIEFEDES 
FCYPFDDEAIVETINRAQVIIDKVKGIFLENFMVKDSTIAKEYKYYYDLPTSEKTI FFKE 
5 CSWGNVFEDENLLEDE* 

Sequence 531 

Contig_0488_pos_737 6_8941, 
is similar to (with p-value 0.0e+00) 
10 >sp:sp| P44023I YFCCHAEIN HYPOTHETICAL PROTEIN HI0594 . >pir:p 
ir|E64O10|E6401O hypothetical protein HI0594 - Haemophilus i 
nfluenzae (strain Rd KW20) >gp: gp [ U3274 1 | U3274 12 Haemophilu 
s influenzae Rd section 56 of 163 of the complete genome. NI 
D: gl573582. 

15 atgaaaccgttggaacaagcgatcaatgataataaaaagaaaaaacgttttaactttaga 
atgccaggtgcatttatgattctctttatcctaacagttgtcgcagttatagcaacttgg 
ataatccccgcgggtgcatactcaaaactttcatatgaaccttcatcccaagaattaaaa 
attgtcaatcctcatcatcaagtaaaaaaagttcctggaacacaaaaggagcttgatcga 
ttaggagttaaaatcaaaatagaacaatttaaatctggtgcaattaataaacccgtttca 

20 attcctaatacttacgaacgtctaaaacaacatccagctggtcttgatcaaattactagt 
agcatggttaaaggaaccatcgaagccgtcgatattatggtctttatacttgttctaggt 
ggactgattggtattgttcaagcgagcgggtcctttgaatcaggattgttagcacttact 
caaaaaacgaaaggccacgaatttatgttgattatgttcgtagcaattttaatgattctg 
ggtggaacactatgtggcattgaagaggaagctgtagcgttctatcctgtactcgttcca 

25 atatttattgcgcttggatatgattctattgtctcagtcggtgcaattttcttagcaagc 
tctgtgggtagtacattctcaacaatcaacccattctcagtcgtcattgcttctaatgca 
gcaggaacaacttttactgatggtctttattggagaataggcgcttgtatcatcggtgcc 
atatt tgttattagttatttattctggtattgtaaaaaaattaaaaaagatcctaaatcc 
tcttattcttatgaagacaaagcagcatttgaaaaacagtggtctgtgctccatgatgac 

30 ggttcttctgagtttacattacgtaaaaagattattcttacgcttttcgtcctaccattc 
cctattatggtttggggcgtcatgacacaaggatggtggttcccagtcatggcatctgca 
ttcttgatctttaccattgtcatcatgtttattgctggaacaggacaatatggtttaggc 
gaaaaaggcactgtagatgcattcgttaatggcgcttcaagtttagtaggtgtatcttta 
atcattggtttagctcgaggaatcaacttagtattgaataaaggaatgatttctgacaca 

35 atcttgcacttttcatcatctatcgtgcaacatatgagtgggcctttatttatcattgtt 
ctgctctttatctttttctgtttaggatttatcgtgccgtcctcatcaggattagcagta 
ctatctatgcctatctttgcgccattagctgatacagtaggtataccaagatttgttatt 
gttacaacatatcaattcggtcagtatgcaatgttgttcttagcgcctactggacttgta 
atggcaacacttcaaatgttaaacatgcgctactcacactggttacgtttcgtatggcct 

40 gttgtcgcgtttgttttaatatttggtggaggcttacttattacacaagttttaatatac 
tcataa 

Sequence 532 

MKPLEQAI NDNKKKKRFN FRMPGAFMI LFILTVVAVIATWI I PAGAYSKLS YEPSSQELK 
45 IVNPHHQVKKVPGTQKELDRLGVKIKIEQFKSGAINKPVSIPNTYERLKQHPAGLDQITS 
SMVKGTIEAVDIMVFILVLGGLIGIVQASGSFESGLLALTQKTKGHEFMLIMFVAILMIL 
GGTLCGIEEEAVAFYPVLVPIFIALGYDSIVSVGAIFLASSVGSTFSTINPFSWIASNA 
AGTTFTDGLYWRIGACIIGAIFVISYLFWYCKKIKKDPKSSYSYEDKAAFEKQWSVLHDD 
GSSEFTLRKKIILTLFVLPFPIMVWGVMTQGWWFPVMASAFLIFTIVIMFIAGTGQYGLG 
50 EKGTVDAFVNGASSLVGVSLIIGLARGINLVLNKGMISDTILHFSSSIVQHMSGPLFI IV 
LLFI FFCLGFIVPSSSGLAVLSMPI FAPLADTVGIPRFVIVTTYQFGQYAMLFLAPTGLV 
MATLQMLNMRYSHWLRFVWPVVAFVLI FGGGLLITQVLI YS* 

Sequence 533 
55 ContigJ)4 88_pos_9616_9212 / 

putative peptide of unknown function 

atgggaggttatggtgcaatcaaatttgcattaacgcaaagttatcgtttctcaaaagcc 
gctatgctttcagcgccatatgatgtttctatgattggtcaatatcaatggtatgatttt 
actccagaagcgattgtaggtaatacgcaacatgtcgcggggacatcttttgatccatac 
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tatttagttgaacaagcaatagacaatggacaaacgttaccacaactatatattacttgt 
ggaactgaagatgaattgtatcaaggtaatattgattttgtgaactatttagatgaaaaa 
ggtatttcatatcaatttaaaaaagcgccaggtcatcacgattatgcattttgggataaa 
gcaatagaagatgtcattgaccgttttacatcatcacatatttaa 

5 

Sequence 534 

MGGYGAIKFALTQSYRFSKAAMLSAPYDVSMIGQYQWYDFTPEAIVGNTQHVAGTSFDPY 
YLVEQAI DNGQTL PQLY I TCGTEDEL YQGN I DFVN YLDEKG I S YQFKKAPGHH DYAFWDK 
AIEDVI DRFTSSH I * 

10 

Sequence 535 

Con t i g_0 4 8 8_pos_8 63 1_8 22 4 , 

is similar to (with p-value 1.0e-35) 

>sp:sp| P4 4 023 I YFCCJ4AEIN HYPOTHETICAL PROTEIN HI0594 . >pir:p 
15 ir IE64O10IE6401O hypothetical protein HI0594 - Haemophilus i 
nfluenzae (strain Rd KW20) >gp:gp| U3274 1 | U32741_2 Haemophilu 
s influenzae Rd section 56 of 163 of the complete genome. NI 
D: gl573582. 

atgataaataaaggcccactcatatgttgcacgatagatgatgaaaagtgcaagattgtg 
20 tcagaaatcattcctttattcaatactaagttgattcctcgagctaaaccaatqattaaa 
gatacacctactaaacttgaagcgccattaacgaatgcatctacagtgcctttttcgcct 
aaaccatattgtcctgttccagcaataaacatgatgacaatggtaaagatcaagaatgca 
gatgccatgactgggaaccaccatccttgtgtcatgacgccccaaaccataatagggaat 
ggtaggacgaaaagcgtaagaataatctttttacgtaatgtaaactcagaagaaccgtca 
25 tcatggagcacagaccactgtttttcaaatgctgctttgtcttcataa 

Sequence 536 

MINKGPLICCTIDDEKCKIVSEIIPLFNTKLI PRAKPMIKDTPTKLEAPLTNASTVPFSP 
KPYCPVPAINMMTMVKIKNADAMTGNHHPCVMTPQTIIGNGRTKSVRIIFLRNVNSEEPS 
30 SWSTDHCFSNAALSS* 

Sequence 537 

Contig_0488_pos_4 4 4 3_4 015, 
is similar to (with p-value 2.0e-50) 
35 >pir :pir | S58181 I S58181 fofB protein - Staphylococcus sp. >gp 
: gp 1X89875 | SSPPDNAFB_1 Staphylococcus sp. plasmidic DNA for 
fosB gene. NID: g927563. 

atggaaataacaaatgttaatcatatttgtttttcagtgagtgatttaaatacctctata 
caattttataaagatattttacatggtgacttattagtatcagatagaacgacagcatat 

40 ttaactattggtcatacttggattgcactgaatctagaaaaaaatataccaaggaatgaa 
ataagtcattcctatacgcacgttgctttctccatagatgaagaagattttcaacagtgg 
attcaatggcttaaagagaatcaagtaaatattttaaaagggcgaccaagagacattaaa 
gacaaaaaatcgatatattttacagatctggatgggcataaaattgaattacatactgga 
acattaaaagatagaatggaatattataaatgtgagaagacgcatatgcaattttacgat 

45 gagttttga 

Sequence 538 

MEITNVNHICFSVSDLNTSIQFYKDILHGDLLVSDRTTAYLTIGHTWIALNLEKNIPRNE 
ISHSYTHVAFSIDEEDFQQWIQWLKENQVNILKGRPRDIKDKKSIYFTDLDGHKIELHTG 
50 TLKDRMEYYKCEKTHMQFYDEF"* 

Sequence 539 

Contig_04 88_pos_2775_1777, 
is similar to (with p-value 0.0e+00) 
55 >sp:sp|P53557|BIOB_BACSU BIOTIN SYNTHETASE (EC 2.8.1.-). >gp 
:gp|AF008220|AF008220_77 Bacillus subtilis rrnB-dnaB genomic 
region. NID: g2293135. >gp:gp I U51868 I BSU51868_5 Bacillus su 
btilis biotin biosynthetic operon genes, complete and partia 
1 cds. NID: gl277024. >gp : gp | Z99119 I BSUB0016_93 Bacillus sub 
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tilis complete genome (section 16 of 21): from 2997771 to 32 
13410. NID: g2635411. 

atgctaatttttaagaaaaaggagttaaagattatgacattaaacctagctcaacgtgtg 
ttaaatcaagagtcattaacaaaagatgaagcaatatctattttcgaaaatgctgaaatt 
5 gatacatttgatttattaaatgaagcctacacagtgagaaaacattactatggtaaaaaa 
gttaagcttaatatgatattaaatgctaaaagtggtatctgtgcagaagattgtgggtac 
tgtgggcaatctgtaaaaatgaaagaaaagcaacgttatgcacttgttgaacaggaccaa 
attaaagaaggcgctcaagtggcaactgaaaatcaaatcggtacatactgtattgttatg 
agtggtagaggtcctagtaacagagaagtcgatcatatttgcgaaacagtagaagatatt 

10 aaaaagatacacccacaactaaagatttgtgcgtgcttaggattaacgaaagaagaacag 
gctaaaaaattaaaggctgctggtgtcgatcgttataatcataatttaaatacgagtgag 
cgttatcacgatgaagtagtaactacacatacatatgaggatagagtgaatacggttgaa 
atgatgaaagataataatatttctccttgttcaggtgtgatatgtggtatgggagagtcg 
aatgaggacattattgatatggcatttgct ttaagagccatcgatgctga tagca ttcct 

15 attaattttttacatcctattaaaggaactaaatttggtggattagatttattgtcacca 
atgaaatgtttaagaattatagcgatgtttaggttaatcaatccaacaaaagaaattcga 
attgcaggtggacgggaggtaaatctacgttcattacaaccactcgcattgaaagcggct 
aattcaatttttgtaggagattacttaattacaggcggtcaaccgaatgaggaagattat 
cgcatgattgaagatttagggtttgaaatcgacagttaa 

20 

Sequence 540 

MLIFKKKELKIMTLNLAQRVLNQESLTKDEAISIFENAEIDTFDLLNEAYTVRKHYYGKK 
VKLNMILNAKSGICAEDCGYCGQSVKMKEKQRYALVEQDQIKEGAQVATENQIGTYCIVM 
SGRGPSNREVDHICETVEDIKKIHPQLKICACLGLTKEEQAKKLKAAGVDRYNHNLNTSE 
25 RYHDEWTTHTYEDRVNTVEMMKDNNISPCSGVICGMGESNEDIIDMAFALRAIDADSIP 
INFLHPIKGTKFGGLDLLSPMKCLRIIAMFRLINPTKEIRIAGGREVNLRSLQPLALKAA 
NSIFVGDYLITGGQPNEEDYRMIEDLGFEIDS* 

Sequence 541 
30 Contig_04 8 8_pos_1109_369, 

is similar to (with p-value 1.0e-70) 

>sp:sp| P32816|GLDA_BACST GLYCEROL DEHYDROGENASE (EC 1.1.1.6) 
(GLDH) . >pir:pir| JQ1474 | JQ1474 glycerol dehydrogenase (EC 1 
.1.1.6) - Bacillus stearothermophilus >gp: gp I M65289 | BACGLDA_ 
35 2 Bacillus stearothermophilus glycerol dehydrogenase (propos 
ed gld) gene, complete cds. NID: gl42976. 

atggatgcaccaacagcagcagtatctgttatttataacgaagatggatcatttagtggt 
tatgaattctaccctaaaaaccctgatacagttatcgtagattctgaaattgttgcacaa 
gcacctgtacgtttatttgcatcaggtatgagtgatggtttagcaacattaatcgaagtt 

40 gaatctacacttcgtagacaagggcaaaacatgttccatggcaaacctacattagcaagt 
ttagcaatcgctcaaaaatgtgaagaggttatttttgaatatggttacagtgcttatact 
tctgtagaaaaacatatcgtgacaccacaagtagatgctgtgattgaagccaatacatta 
ctttcaggtttaggatttgaaaacggcggattagcaggtgcacacgcaattcataatgga 
ttcacagctttagaaggggatatccaccacttaactcatggtgaaaaagtggcatacggt 

45 attttagtacaattagtacttgaaaatgcgccaactgaaaaattcatgaaatacaaaaca 
ttcttcgataatatcaatatgccaacaacattagaaggtcttcacattgaaaacacaagt 
tatgaagaattagttcaagtaggtgaacgtgcattaacaccaaatgatacgtttgctaac 
ttaagtgataaaatcactgctgatgaaatcgcagacgcaattttaactgttaatgattta 
tctaaaagtcagttcaactaa 

50 

Sequence 542 

MDAPTAAVSVIYNEDGSFSGYEFYPKNPDTVI VDSEIVAQAPVRLFASGMSDGLATLIEV 
ESTLRRQGQNMFHGKPTLASLAIAQKCEEVIFEYGYSAYTSVEKHIVTPQVDAVIEANTL 
LSGLGFENGGLAGAHAI HNGFTALEGDI HHLTHGEKVAYG I LVQLVLENAPTEKFMKYKT 
55 FFDNINMPTTLEGLHIENTSYEELVQVGERALTPNDTFANLSDKITADEIADAILTVNDL 
SKSQFN+ 

Sequence 543 

Cont ig_0 4 8 8_pos_0_3 5 3 , 
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is similar to (with p-value 5.0e-26) 

>pir :pir I S4 8578 | S4 8578 hypothetical protein - Mycoplasma cap 
ricolum (SGC3) (fragment) 

atgaaaaagttaattcaagataaaaacacaattttaaaagatatgcttgatggaattaca 
5 gtttcaaacaacgatgttgaagttgtatctgacactattgttgttagaaagcataaaaaa 
caatcaggtgttgcactcgtttctgggggcggcagtggacatgaacctgcacacgcagga 
tttgtagcagaaggcatgctcgatgcagctgtatgtggagaaatcttcacttcacctaca 
cctgataaaatattagatgccattaaagctgtggacaatggtgacggcgttctacttgtt 
attaaaaactatgcaggagacgttatgaactttgaaatggctcaagaaatggc 
10 ' . ' " . 

Sequence 544 

MKKLIQDKNTILKDMLDGITVSNNDVEVVSDTIVVRKHKKQSGVALVSGGGSGHEPAHAG 
FVAEGMLDAAVCGEI FTSPTPDKILDAI KAVDNGDGVLLVI KN YAGDVMN FEMAQEMA 

15 Sequence 545 

Cont ig_04 8 9_pos_l 4 2_1 7 4 0, 

is similar to (with p-value 0.0e+00) 

>sp:sp|P178 94|RECN_BACSU DNA REPAIR PROTEIN RECN (RECOMBINAT 
ION PROTEIN N) . >pir : pir | B35128 | B35128 recN homolog - Bacill 

20 us subtilis >gp: gp I D84 432 | BACJH642_227 Bacillus subtilis DNA 
, 283 Kb region containing skin element. NID: g2627063. >gp: 
gp I M302 97 | BACRECN_2 B. subtilis recombination and sporulation 
protein (recN, spoIVB) genes , complete cds, arginine hydro 
ximate resistance (ahrC) gene, 3' end. NID: gl43400. >gp:gp| 

25 Z99116I BSUB0013_135 Bacillus subtilis complete genome (secti 
on 13 of 21): from 2395261 to 2613730. NID: g2634723. 
atgagtggtgaaactggctcaggaaaatctatcattattgatgccattggacagttaatc 
ggtatgagagct tcttctgattacgtcagacatggtgaaaagaaagcaattatcgaaggt 
atctttgatatagacgagagtaaagacgcaattaatatactagaatcattagctatagat 

30 gttgatgaagattttttattagttaaaagagaaattttcagttctggtaagagtatttgt 
cgtattaataaccaaactgtcactctacaggacttaagaaaagtgatgcaagaactgctt 
gatattcatggtcaacatgaaacgcaatctttacttaagcaaaaatatcatcttcaacta 
ttagatgattatgcagacaatcagtattcagatttacttaatcaatatcaactttcttat 
aaccaatataaaaataaacgtaaagaattagaggaattagaatccgcggaccaggcttta 

35 ttacaacgattagacttaatgaaatttcaattagaggaactaaccgaagcttcactgaaa 
gaaggcgaagtggaccaacttgaatccgatattaaaagaattcaaaactccgaaaaatta 
aatctagctttaaacaatgcacatcaagttctaactgatgaaagtgcaatacccgatagg 
ttgtacgaattaagcaactacttgcaaacgattaatgatatcgttccagaaaaattcgta 
agattaaaagaggacattgatcaattttactatatgctagaagatgcaaagcatgaaatt 

40 tacgacgaaatggctaacactgaattcgatgagcaagttttaaatgagtatgaatccaga 
atgaatttacttaataatttaaaacgtaaatatggtaaggatattactgaacttattgct 
tatcagagtaaacttgcaaatgaaattgataaaatagaaaactatgaacaaagtacatca 
caattaagggaagaaattaaaacgctttataacgaagtgatagatataggaaaaaaactt 
tctcaagaacgtaggcgtgtagcgagagagttaagggaccatattgtttctgaaatacaa 

45 aatttacaaatgaaagatgctaaccttgaaatttcgtttaaaccattagatgaacctaca 
attgaaggtattgaatttgtggaatttttaattagtccaaatcgtggtgaaccacttaaa 
agtcttaataaaatcgcttcaggtggtgaactttcaagaattatgcttgctctaaaaagt 
atatttgttaaatcacgcggccaaaccgcgattctttttgatgaagttgactcgggtgta 
tctggtcaagcagcacaaaaaatggctgaaaaaatgcgagatattgctcaatatatacaa 

50 gttatttgtatttcacacttacctcaggtagcttcaatgagtgaccatcatcttctaata 
agcaaggcatccaatgccgatagaactacaactcaagtcaaagaattgaaagatgaaaac 
aaaatagatgaaatagcacgtatgatttcaggagcaagtgtgactgagctcacgagagaa 
aatgcaaaagaaatgattaagcaaaatcacaatatttaa 

55 Sequence 54 6 

MSGETGSGKSII IDAIGQLIGMRASSDYVRHGEKKAIIEGIFDI DESKDAINILESLAID 
VDEDFLLVKREI FSSGKSICRINNQTVTLQDLRKVMQELLDIHGQHETQSLLKQKYHLQL 
LDDYADNQYSDLLNQYQLSYNQYKNKRKELEELESADQALLQRLDLMKFQLEELTEASLK 
EGEVDQLESDIKRIQNSEKLNLALNNAHQVLTDESAIPDRLYELSNYLQTINDIVPEKFV 
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RLKEDIDQFYYMLEDAKHEI YDEMANTEFDEQVLNEYESRMNLLNNLKRKYGKDITELIA 
YQSKLANEIDKIENYEQSTSQLREEIKTLYNEVIDIGKKLSQERRRVARELRDHIVSEIQ 
NLQMKDANLEISFKPLDEPTIEGIEFVEFLISPNRGEPLKSLNKIASGGELSRIMLALKS 
IFVKSRGQTAILFDEVDSGVSGQAAQKMAEKMRDIAQYIQVICISHLPQVASMSDHHLLI 
5 SKASNADRTTTQVKELKDENKIDEIARMISGASVTELTRENAKEMIKQNHNI* 

Sequence 547 

Contig_04 89jpos_2000_3421, 

is similar to (with p-value 0.0e+00) 

10 >sp: sp| P54 533 I DLD2_BACSU LIPOAMIDE DEHYDROGENASE COMPONENT ( 
E3) OF BRANCHED-CHAIN ALPHA-KETO ACID DEHYDROGENASE COMPLEX 
(EC 1.8.1.4) { DIHYDROLIPOAMIDE DEHYDROGENASE) (LPD-VAL) . 
atgtcagaaaaacaatacgatttagtcgtgttaggtggtggtacggcaggatatgtagcc 
gccatcagagcttctcaattaggaaaaaaagtagcgatagtagaaaaatcactcttaggt 

15 ggtacgtgtttacataaaggatgtatacctactaaagcacttttaaaatcggctgaagtc 
aatcatactattaaaaacgcgcatacatttggaattgatgtcaatcattttaaaattaat 
ttccctaaaattttagaacgt aaagatgctattgt t aagcaattgcatgaaggcgtcaat 
caactgatgaaacatcatcatatagatatttataacggtattggacgaattatgggaaca 
tctatattttctcctcaaagcggtacaatttctgtggaatatgaagacggcgaatcagat 

20 atactccctaataaaaatgtgcttatagctactgggtcatcaccacagtctcttccgttc 
attaaatttgaccataaacaaatactatcgagtgatgatatcctaaggttaaatacacta 
ccacaaagattagcaatcataggtggaggtgttattggtttagaatttgcatctctgatg 
aatgatttaggtgctgatgtagtagtaatcgaagcgaatgacagagttcttcctaccgag 
agcacacaagttgcgtcattgctaaaagaagaattaactaatcgaggcgttacattctac 

25 gaaaatattcaattgaccaaagatcattttaaccaaactgataagggtgtaactattaat 
atttcagatgagcccgtccaattcgataaagtacttgttgcaattggtagaaagcctaat 
acaaatgatattggtttaaataacactcaaattaagacttctgatgctggtcatattata 
acaaatggttatcagcaaactgaagataaacatatatacgcagcaggagattgtataggg 
caattacaattggcacacgtcggttcaaaagaagctatagttgcagttgaacatatgttt 

30 gattgttctcctatacctatcaattatgacctgataccaaaatgtgtttatacaaaccca 
gaaattgcttcaattggtaaaaatttagaacaagcaaaaaaagcaggcatcaaagcaaaa 
agtatcaaagttccttttaaagctataggaaaggcaataattgaggatgtaacccaatca 
aaaggattttgcgagatggtagttaacaaagatgacgatgaaatcataggtcttaatatg 
atagggccacatgttacagaattaataaatgaaatttcattgttacaatttatgaatggc 

35 tcatctttagaacttggtttaacaacacatgcacatccttcattatccgaggtagtcatg 
gaattaggtttaaaagctaatggtcaagcaattcatgtatag 

Sequence 548 

MSEKQYDLVVLGGGTAGYVAAIRASQLGKKVAIVEKSLLGGTCLHKGCIPTKALLKSAEV 
40 NHTIKNAHTFGIDVNHFKINFPKILERKDAIVKQLHEGVNQLMKHHHIDI YNGIGRIMGT 
SIFSPQSGTISVEYEDGESDILPNKNVLIATGSSPQSLPFIKFDHKQILSSDDILRLNTL 
PQRLAI IGGGVIGLEFASLMNDLGADVWIEANDRVLPTESTQVASLLKEELTNRGVTFY 
ENIQLTKDHFNQTDKGVTINISDEPVQFDKVLVAIGRKPNTNDIGLNNTQIKTSDAGHII 
TNGYQQTEDKHIYAAGDCIGQLQLAHVGSKEAIVAVEHMFDCSPIPINYDLIPKCVYTNP 
45 EIAS IGKNLEQAKKAGI KAKS I KVP FKAIGKAI IEDVTQSKGFCEMVVNKDDDEI IGLNM 
IGPHVTELINEISLLQFMNGSSLELGLTTHAHPSLSEVVMELGLKANGQAIHV* 

Sequence 54 9 

Cont ig_04 8 9_pos_3 4 3 6_4 4 34, 

50 is similar to (with p-value 2.0e-51) 

>gp:gp|AF012285 | AF012285_33 Bacillus subtilis mobA-nprE gene 

region. NID: g3282109. >gp:gp| Z99111 I BSUB0008_130 Bacillus 
subtilis complete genome {section 8 of 21) : from 1394791 to 
1603020. NID: g2633699. 

55 atgatagattacaagtcagcaggccttacagaagaagacctcaaaaaaatatataaatgg 
atggacttaggaagaaaaacagacgaaaggctatggttactcaatcgtgcaggtaaaatt 
ccatttgttgtcagtggtcaagggcaggaagcaactcaaattggtatggcatatgcaatg 
caaaaaggtgatatctcatcaccttattatcgtgatttagcatttgtcacttatatggga 
atttctccattggatactatgttatcagcttttggaaaacgtgatgacattaactcagga 
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ggtaaacaaatgccttctcattttagtcacaaagaaaaaggcattttatctcaaagttct 
ccagtagccactcaaataccacattctgtcggtgctgcattagcacttaaaatggataac 
aagccaaatattgctaccgcaacagttggagaaggcagttcaaatcaaggtgactttcac 
gaaggtatgaactttgctgcagttcacaaattacctttcgtctgtgtaataattaacaat 
5 aaatatgcgatatctgtaccagattcactacaatatgctgctgaaaagttatcagatcgt 
gcattaggttacggtatgcatggaatacaggtagatggaaatgacccaattgcagtatac 
aaagcgatgaaagaagcaagagaacgagcgctagcaggtgaaggtccaacattgatagaa 
gctgtcacttcacgtatgacaccacattcatctgatgatgatgatacatatcgtacaaaa 
gaagaaagagacctattgaaacaagaggattgtaatataaaat ttaaaacggccttactc 
10 gatcaaggcatcataaacgaaaattggttgagtcaattggaaaaagagcataaagaactc 
attaatgaagctactaaatctgctgaagcagcaccatatccttcagaagaagaagctttg 
acatatgtttatgaagagggaggtcaacgaaatgactaa 

Sequence 550 

1 5 MI DYKSAGLTEEDLKKI YKWMDLGRKTDERLWLLNRAGKI PFWSGQGQEATQIGMAYAM 
QKGDISSPYYRDLAFVTYMGISPLDTMLSAFGKRDDINSGGKQMPSHPSHKEKGILSQSS 
PVATQI PHSVGAALALKMDNKPNI ATATVGEGSSNQGDFHEGMNFAAVHKLPFVCVI INN 
KYAISVPDSLQYAAEKLSDRALGYGMHGIQVDGNDPIAVYKAMKEARERALAGEGPTLIE 
AVTSRMTPHSSDDDDTYRTKEERDLLKQEDCNIKFKTALLDQGIINENWLSQLEKEHKEL 

20 INEATKSAEAAPYPSEEEALTYVYEEGGQRND* 

Sequence 551 

Cont i g_0 4 8 9_pos_5 5 5 8_67 4 2 , 
is similar to (with p-value 4.0e-52) 
25 >gp : gp | L25604 | BACBMRURBE_4 Bacillus subtilis bmrU, multidrug 
efflux transporter (bmr) and its regulator (bmrR) genes, co 
mplete cds, and branched-chain 2-oxo acid dehydrogenase (bfm 
B) gene, 3' end. NID: g2558636. 

gtgccttcaacaatttctggaacaataacagaattagtggttgaagaaggacaaactgtc 

30 aatattaacacggtgatttgtaaaatcgattcggaaaatggtcaaaatcaaacagaatcg 
gcaaatgagtttaaggaagaacaaaatcagcattctcaatcaaatataaacgtgtcacaa 
ttcgaaaataatcctaaaactcatgaaagtgaggtgcatacagcctctagtcgcgcaaat 
aacaatggacgattttcaccagttgtctttaaattagcttctgaacatgatattgattta 
acacaagtcaaaggaactggttttgaaggtcgtgttactaagaaagatattcaaaatatt 

35 attaacaatccaaacgatcaagaaaaagagaaagaatttaaacaaacagataaaaaagat 
cattcaacgaaccattgtgactttttacatcaatcctcaactaaaaacgaacactcacca 
ttatcaaatgaacgtgtcgtaccagttaaaggtattagaaaagctatcgcacaaaatatg 
gttactagtgtcagcgaaataccacacggttggatgatggttgaagctgatgcaacgaat 
ttggttcagactagaaactatcataaagctcaatttaaacagaatgagggttacaattta 

40 actttctttgcgttttttgtaaaagctgttgcagaggctttaaaagtaaatccattactc 
aatagtacatggcaaggagatgaaattgttatccacaaagatattaatatctctattgct 
gttgcagacgatgataagttgtatgtgccagtcattaaaaatgcagatgaaaaatcaatt 
aaaggtatcgcgcgtgaaatcaatgatttagctactaaagcaagattaggaaaattagca 
caaagtgatatgcaaaacggtacatttacggttaataatactggttcttttggttctgtt 

45 tcttcaatgggaatcattaatcatccacaagctgccattttacaagtagaatcagtcgtt 
aagaaacctgtagttatagatgatatgattgcaattagaaatatggttaatttgtgtatt 
tcaatcgatcatcgtattctcgatggtgttcaaacgggaaaatttatgaatcttgttaag 
aaaaaaatagaacaatattctattgaaaacacttctatttattaa 

50 Sequence 552 

VPSTISGTITELVVEEGQTVNINTVICKIDSENGQNQTESANEFKEEQNQHSQSNINVSQ 
FENNPKTHESEVHTASSRANNNGRFSPVVFKLASEHDI DLTQVKGTGFEGRVTKKDIQNI 
INNPNDQEKEKEFKQTDKKDHSTNHCDFLHQSSTKNEHSPLSNERVVPVKGIRKAIAQNM 
VTSVSEI PHGWMMVEADATNLVQTRNYHKAQFKQNEGYNLTFFAFFVKAVAEALKVNPLL 

55 NSTWQGDEIVIHKDINISIAVADDDKLYVPVIKNADEKSIKGIAREINDLATKARLGKLA 
QSDMQNGTFTVMNTGSFGSVSSMGIINHPQAAILQVESVVKKPWI DDMIAIRNMVNLCI 
S I DHRI LDGVQTGKFMNLVKKKI EQYSI ENTS I Y* 

Sequence 553 
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Cont ig_0 4 8 9_pos_653 9_62 1 3 , 

is similar to (with p-value 9.0e-29) 

>gp: gp I L25604 | BACBMRURBE_4 Bacillus subtilis bmrU, multidrug 
efflux transporter (bmr) and its regulator (bmrR) genes, co 
5 mplete cds, and branched-chain 2-oxo acid dehydrogenase (bfm 
B) gene, 3' end. NID: g2558636. 

atgattcccattgaagaaacagaaccaaaagaaccagtattattaaccgtaaatgtaccg 
ttttgcatatcactttgtgctaattttcctaatcttgctttagtagctaaatcattgatt 
tcacgcgcgatacctttaattgatttttcatctgcatttttaatgactggcacatacaac 
10 ttatcatcgtctgcaacagcaatagagatattaatatctttgtggataacaatttcatct 
ccttgccatgtactattgagtaatggatttacttttaaagcctctgcaacagcttttaca 
aaaaacgcaaagaaagttaaattgtaa 

Sequence 554 

15 MIPIEETEPKEPVLLTVNVPFCISLCANFPNLALVAKSLISE^AIPLIDFSSAFLMTGTYN 
LSSSATAIEILISLWITISSPCHVLLSNGFTFKASATAFTKNAKKVKL* 

Sequence 555 

Contig_04 89_pos_3383_3072, 

20 is similar to (with p-value 6.0e-17) 

>sp:sp!P54 533|DLD2_BACSU LIPOAMIDE DEHYDROGENASE COMPONENT { 
E3) OF BRANCHED-CHAIN ALPHA-KETO ACID DEHYDROGENASE COMPLEX 
(EC 1.8.1.4) ( DI H YDROLI POAMI DE DEHYDROGENASE) (LPD-VAL) . 
atgactacctcggataatgaaggatgtgcatgtgttgttaaaccaagttctaaagatgag 

25 ccattcataaattgtaacaatgaaatttcatttattaattctgtaacatgtggccctatc 
atattaagacctatgatttcatcgtcatctttgttaactaccatctcgcaaaatcctttt 
gattgggttacatcctcaattattgcctttcctatagctttaaaaggaactttgatactt 
tttgctttgatgcctgctttttttgcttgttctaaatttttaccaattgaagcaatttct 
gggtttgtataa 

30 

Sequence 556 

MTTSDNEGCACVVKPSSKDEPFINCNNEISFINSVTCGPI ILRPMISSSSLLTTISQNPF 
DWVTSSI I AFPI ALKGTLILFALMPAFFACSKFLPIEAISGFV* 



35 Sequence 557 

Cont ig_0 4 90_pos_4 2 95_4 798, 

is similar to (with p-value 3.0e-46) 

>sp:sp| P42876|UREF_STAXY UREASE ACCESSORY PROTEIN UREF. >gp: 
gpl Z35136|SXUREFG_1 S.xylosus (C2a) UreF and UreG genes. NID 
40 : g511068. 

atgagaattgtctaccacgcattaattaacaatgacaaagataaaattttagatattaac 
caaaaactcttcgtacaaaatctacctaaagaaacgcgtattggcgctaagcaaatgggt 
acacgcatggtaaaattagctttagatctttatgatagtgaatggattcaatggtattat 
aatcaaatgaaaaacaataaaattaagcttcatcctgctgtgtgctttactatgctagga 
45 cattttttaggtgtagatgtggaatccatcattgattattatttatatcaaaatatctct 
agccttacccaaaatgcagtaagagcgattcctttaggacaaacagctggacagcaagtc 
gtaactgaaatgatagcccatattgagaagacacgacatcacatactagaattggacgaa 
atcgattttggtatgactgctcccggcttggaacttaatcaaatggaacatgaaaatgtt 
catgttcgaatctttatttcatag 

50 

Sequence 558 

MRIVYHALINNDKDKILDINQKLFVQNLPKETRIGAKQMGTRMVKLALDLYDSEWIQWYY 
NQMKNNKI KLH PAVC FTMLGH FLGVDVES 1 1 DY YL YQN I S SLTQNAVRAI PLGQTAGQQV 
VTEMIAHIEKTRHHILELDEI DFGMTAPGLELNQMEHENVHVRIFIS* 

55 

Sequence 559 

Contig_04 90_pos_4880_5425, 

is similar to (with p-value 5.0e-91) 

>sp:sp| P42877 |UREG_STAXY UREASE ACCESSORY PROTEIN UREG . >gp: 
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gpl 235136 | SXUREFG_2 S.xylosus (C2a) UreF and UreG genes. NID 
: g511068. 

gtggttaaacgccttgcgaaaaaaatgagtattggcgttattactaatgatatctatact 
aaagaagatgaaaaaatactagttaatacaggtgttttaccagaagatagaattatcggt 
5 gtggaaactggaggttgtcctcatacagctattcgtgaagacgcctcaatgaacttcgca 
gccatagatgaattattagaacgtaatgatgatattgaacttatttttattgaatcaggt 
ggcgataacttagcggctacttttagtccagaactcgttgacttttcaatttatatcatt 
gatgttgctcagggcgaaaagattccacgtaaaggtggacaaggtatgattaaatctgat 
ttcttcattattaataaaactgaccttgcaccatatgtgggtgcttcattagatcaaatg 
10 gctaaagatactgaagtatttcgtggaaatcatccattcgcttttacaaatttaaaaact 
gatgaaggtttagaaaaagttattgagtggattgagcacgacgtcttactgaaagggtta 
acttaa 

Sequence 5 60 

15 VVKRLAKKMSIGVITNDI YTKEDEKILVNTGVLPEDRIIGVETGGCPHTAIREDASMNFA 
AIDELLERNDDIELIFIESGGDNLAAT FSPELVDFSI YI IDVAQGEKIPRKGGQGMIKSD 
FFIINKTDLAPYVGASLDQMAKDTEVFRGNHPFAFTNLKTDEGLEKVIEWIEHDVLLKGL 
<p + 

20 Sequence 561 

Contig_04 90_pos_5557_0, 

is similar to (with p-value 7.0e-46) 

>sp:sp|Q07400|URED_BACSB UREASE ACCESSORY ' PROTEIN URED. >pir 
:pir 1G36950 IG36950 ureD protein - Bacillus sp. (strain TB-90 
25 ) >gp: gp| D14 4 39 1 BACUREA_7 Thermophilic Bacillus genes for ur 
ease subunits and urease accessory proteins, complete cds . N 
ID: g393296. 

gtgccaactttctatattgtcaatgtgggtggaggttatctagatggagatagataccgt 
gtcaatgtcaacttagaagataatgcacaagtgacgcttacttctcaaggtgcaactaaa 

30 atatataaaacgcctaatgaccatgtagaacagtatcaaacgtttaatttatcaaatcaa 
tcgtatatggaatttgtagcagatcctattattgcctatgaaaacgctaaatttttccaa 
cataatacgtttaatcttaaagaagatagtgctatattttacacagatatattgactccc 
ggctattcatctaatggccaagatttcacgtataattatatgcatcttactaatgaaatt 
tacattgacaatcaattagttgttttcgataacatgatgttaagtcctgataaaagccga 

35 cttgacggtattgggta tat ggaaaattatacacacttaggatcagcttattttat teat 
ccagatgtaaaccaaagtttcatagacgatatttacgtggcggttgctgattttcaaaaa 
caatacgactgtagaataggtatctcacaattacctactcatggattggccgttcgtatt 
ttgactaaaagaactcaaataatagaagaaattttgactcgtgttcaatcat 

40 Sequence 562 

VPTFYIVNVGGGYLDGDRYRVNVNLEDNAQVTLTSQGATKI YKTPNDHVEQYQTFNLSNQ 
SYMEFVADPIIAYENAKFFQHNTFNLKEDSAIFYTDILTPGYSSNGQDFTYNYMHLTNEI 
YIDNQLVVFDNMMLSPDKSRLDGIGYMENYTHLGSAYFIHPDVNQSFIDDIYVAVADFQK 
QYDCRIGI SQLPTHGLAVRI LTKRTQI IEEI LTRVQSX 

45 

Sequence 563 

Contig_04 90_pos_34 69_3101, 

is similar to (with p-value 3.0e-34) 

>sp:sp|P02395|RL7_MICLU 50S RIBOSOMAL PROTEIN L7/L12 (MAI/MA 
50 2). >pir:pir | A02771 1 R7MCML ribosomal protein L7/L12 - Microc 
occus luteus 

atggctaatcaagaacaaatcattgaagcaattaaagaaatgtcagtattagaattaaac 
gatttagtaaaagcaattgaagaagaatttggtgtaactgcagcagctccagtagcagca 
gcaggtgcagctggtggcggagatgcagcagctgaaaaaactgaatttgatgttgaatta 
55 acttcagctggatcttcaaaaattaaagttgttaaagcagttaaagaagcaactggctta 
ggattaaaagatgctaaagaattagtagatggagctcctaaagtaattaaagaagctatg 
cctaaagaagatgctgaaaaacttaaagaacaattagaagaagttggagctagcgtagaa 
ttaaaatag 
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Sequence 564 

MANQEQIIEAIKEMSVLELNDLVKAIEEEFGVTAAAPVAAAGAAGGGDAAAEKTEFDVEL 
TSAGSSKIKWKAVKEATGLGLKDAKELVDGAPKVIKEAMPKEDAEKLKEQLEEVGASVE 
LK* 

5 

Sequence 565 

Contig_04 90_pos_3035_2301, 

is similar to {with p-value 2.0e-81) 

>pir:pir|S59955|S59955 hypothetical protein 202 - Staphyloco 
10 ecus aureus >gp : gp I X64 172 | SARPLRP0_2 S, aureus rplL, orf202, 
rpoB(rif) and rpoC genes for ribosomal protein L7/L12, hypot 
hetical protein ORF202, DNA-directed RNA polymerase beta & b 
eta» chains. NID: g677848. 

atgcaaatacaacctgaattctttgagaagttaaaaccccgttattccaataacggggtt 
15 tttcaattaattaatacaagcatactaaatgcttttggattgaatgataaaaatagaggt 
gaagaaatgagtcattattatgatgaacaacctgatgttaaaagtaacccaaaaagaatt 
agttatcaaattaaaaatgcgcaactagagcttactactgatgctggagttttttcaaaa 
gataatgtagattttggatctgacttactaattaaaacttttttaaaagaacatcctcca 
ggcccaagtaaaaccatcgcggatgtaggatgtggatatggtcctatcggtttagcaata 
20 ggaaaagtatctccacaccatcaaatcacaatgttggatattaacaatagagccttggcg 
ttggcagaaatgaataagacgaaaaatcaagtggataatgtaacgattatagaaagcgat 
tgtttatctgctgtcaatcatcagtgctttgattacattttaactaatccccctattaga 
gctggtaaggacattgttcatcgaatctttgaacaagcgtttgacagactcaagactacg 
ggtgaactttatgtcgtcattcaaaaaaagcaaggtatgccttcagctaaaaagaaaata 
25 gaagaactatttggcaatgtagaaattatagctaagagtaaaggatattatattttgaaa 
agtataaaaggttga 

Sequence 566 

MQIQPEFFEKLKPRYSNNGVFQLINTSILNAFGLNDKNRGEEMSHYYDEQPDVKSNPKRI 
30 SYQIKNAQLELTTDAGVFSKDNVDFGSDLLIKTFLKEHPPGPSKTIADVGCGYGPIGLAI 
GKVSPHHQITMLDINNRALALAEMNKTKNQVDNVTIIESDCLSAVNHQCFDYILTNPPIR 
AGKDIVHRIFEQAFDRLKTTGELYVVIQKKQGMPSAKKKIEELFGNVEIIAKSKGYYILK 
SIKG + 

35 Sequence 567 

ContigJ)4 90_pos_0_194 4 , 

is similar to (with p-value 0.0e+00) 

>sp:sp| P477 68 |RPOB_STAAU DNA-DIRECTED RNA POLYMERASE BETA CH 
AIN (EC 2.7.7.6) {TRANSCRIPTASE BETA CHAIN) (RNA POLYMERASE 

40 BETA SUBUNIT) . >pir : pir I S5995 1 | S59951 DNA-directed RNA polym 
erase (EC 2.7.7.6) beta chain - Staphylococcus aureus >gp:gp 
|X64172|SARPLRPO_3 S. aureus rplL, orf202, rpoB(rif) and rpoC 
genes for ribosomal protein L7/L12, hypothetical protein OR 
F202, DNA-directed RNA polymerase beta & beta' chains. NID: 

45 g677848. 

atgttcagagatatttctccaattgaagatttcactggcaacctatctttagaatttgta 
gattacagattaggtgaacctaaatatgatttagaagaatcaaaaaaccgtgacgctact 
tatgctgcacctcttcgtgtgaaagtgcgtcttattattaaagaaactggtgaagttaaa 
gaacaagaagtcttcatgggtgattttccactgatgacagatacaggtacgttcgtaatt 

50 aatggtgctgaacgtgttatcgtatctcaattagttcgttcaccatccgtgtattttaac 
gagaaaatcgataaaaatggacgtgaaaactatgatgcaacaatcattcctaaccgtggt 
gcttggttagagtatgagacagatgctaaagatgttgtatatgttcgtatcgatagaaca 
cgtaaattaccattgactgtattactacgtgcgctaggtttctcaactgatcaagaaatt 
gttgacttattaggagacagcgaatatttacgtaatacattagaaaaagatgggacagaa 

55 aatacagaacaggctttattagagatttatgaacgtttacgtcctggcgaaccaccaaca 
gtagaaaatgctaaaagtttattatattctcgtttcttcgaccctaaacgctatgattta 
gccagtgtaggtcgttataaagcgaacaaaaaattacacctaaaacatcgtttgttcaat 
caaaaattagcagaaccaattgttaacagtgaaactggtgaaattgttgttgacgaagga 
acagtgttagatcgtcgtaaacttgacgaaatcatggacgtattagaaacaaacgctaat 



143 



WO 01/34809 



PCT/USOO/30782 



agcgaagtatttgaacttgaaggtagcgtaattgacgaacctgtagaaatccaatctatt 
aaagtgtatgtgcctaacgatgaagaaggtcgtacgactactgtcattggtaatgcatta 
cctgattctgaagttaaatgtattactccagcagatattgttgcctcaatgagttatttc 
ttcaacttattgaatggcattggttatacagatgatattgatcatctaggtaatcgtcgt 
5 ttacgttctgtcggtgagctattacaaaatcaattccgtatcggtttatccagaatggaa 
cgtgttgttcgtgaaagaatgtcaatacaagatacagattctattacgccacaacaactc 
attaatatcagaccagttattgcatcaatcaaagaattctttggtagttcacaattatct 
caattcatggaccaagctaacccgttagcagagttaacgcacaaacgtcgtttatctgct 
ctagggcctggtggattaacacgtgaacgtgctcaaatggaagtgcgtgacgttcactac 

10 tctcactatgggcgtatgtgtccaattgaaacacctgagggtcctaatattggtttaata 
aactcattgtcaagttatgctagagtgaatgaatttggttttattgaaacgccatatcgt 
aaagtggatttagatacaaactcaatcactgatcaaatagattatttgacagctgatgaa 
gaggatagttacgttgttgcacaggctaattctagacttgatgaaaatggtcgtttctta 
gatgatgaagttgtttgtcgtttccgtggtaataacactgttatggctaaagaaaaaatg 

15 gattacatggacgtatcaccaaaacaagttgtttcagcagcaacagcatgtattccattc 
ttagaaaatgacgactctaaccgtgcgttaatgggagcaaacatgcaacgtcaagcggtg 
cctttaatgaatccggaagctccatttgtgggtacaggtatggaacacgtatccgcaaga 
gactctggtgctgcaattactgctaagcatagaggacgcgttgagcatgttgaatctaat 
gaaattttagttcgtcgtttagtc 

20 

Sequence 568 

MFRDISPIEDFTGNLSLEFVDYRLGEPKYDLEESKNRDATYAAPLRVKVRLIIKETGEVK 
EQEVFMGDFPLMTDTGTFVINGAERVIVSQLVRSPSVYFNEKIDKNGRENYDATI IPNRG 
AWLEYETDAKDVVYVRIDRTRKLPLTVLLRALGFSTDQEIVDLLGDSEYLRNTLEKDGTE 

25 NTEQALLEIYERLRPGEPPTVENAKSLLYSRFFDPKRYDLASVGRYKANKKLHLKHRLFN 
QKLAEPIVNSETGEI VVDEGTVLDRRKLDEIMDVLETNANSEVFELEGSVIDEPVEIQSI 
KVYVPNDEEGRTTTVIGNALPDSEVKCITPADIVASMSYFFNLLNGIGYTDDIDHLGNRR 
LRSVGELLQNQFRIGLSRMERVVRERMSIQDTDSITPQQLINIRPVIASIKEFFGSSQLS 
QFMDQANPLAELTHKRRLSALGPGGLTRERAQMEVRDVHYSHYGRMCPIETPEGPNIGLI 

30 NSLSS YARVNEFGFI ETPYRKVDLDTNS I TDQI DYLTADEEDS YWAQANSRLDENGRFL 
DDEVVCRFRGNNTVMAKEKMDYMDVSPKQVVS7\ATACI PFLENDDSNRALMGANMQRQAV 
PLMNPEAPFVGTGMEHVSARDSGAAITAKHRGRVEHVESNEILVRRLV 

Sequence 569 
35 C o n t i g_0 491_pos_1640_0, 

putative peptide of unknown function 

gtggatgatgtgacaaaatatggtccagttgatggagatccgatcacgtcaacggaagaa 
attccattcgacaagaaacgtgaattcaatcctgatttaaaaccaggtgaagagcgtgtt 
aaacaaaaaggtgaaccaggaacaaaaacaattacaacaccaacaactaagaacccatta 

40 acaggggaaaaagttggcgaaggtgaaccaacagaaaaaataacaaaacaaccagtagat 
gaaatcacagaatatggtggcgaagaaatcaagccaggccataaggatgaatttgatcca 
aatgcaccgaaaggtagccaagaggacgttccaggtaaaccaggagttaaaaaccctgat 
acaggcgaagtagtcacaccaccagtggatgatgtgacaaaatatggtccagttgatgga 
gatccgatcacgtcaacggaagaaattccattcgacaagaaacgtgaattcaatcctgat 

45 ttaaaaccaggtaaagagcgcgttaaacagaaaggtgaaccaggaacaaaaacaattaca 
acaccaacaactaagaacccattaacaggggaaaaagttggcgaaggtgaaccaacagaa 
aaagtaacaaaacaaccagtagatgaaatcacagaatatggtggcgaagaaatcaagcca 
ggccataaggatgaatttgatccaaatgcaccgaaaggtagccaagaggacgttccaggt 
aaaccaggagttaaaaatcctgatacaggcgaagtagttactccaccagtggatgatgtg 

50 acaaaatatggtccagttgatggagatccgattacgtcaacggaagaaattccgtttgat 
aaaaaacgcgaatttgatccaaacttagcgccaggtacagagaaagtcgttcaaaaaggt 
gaaccaggaacaaaaacaattacaacaccaacaactaagaacccattaacaggggaaaaa 
gttggcgaaggtgaaccaacagaaaaagtaacaaaacaaccagtggatgaaatcgttcat 
tatggtggcgaagaaatcaagccaggccataaggatgaatttgatccaaatgcaccgaaa 

55 ggtagccaagaggacgttccaggtaaaccaggagttaaaaaccctgatacaggcgaagta 
gttactccaccagtggatgatgtgacaaaatatggtccagttgatggagatccgattacg 
tcaacggaagaaattccgtttgataaaaaacgcgaatttgatccaaacttagcgccaggt 
acagagaaagtcgttcaaaaaggtgaaccaggaacaaaaacaattacaacaccaacaact 
aagaacccattaacaggggaaaaagttggcgaaggtgaaccaacagaaaaagtaacaaaa 
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caaccagtggatgaaatcgttcattatggtggcgaagaaatcaagccaggccataaggat 
gaatttgatccaaatgcaccgaaaggtagtcaaacaacgcaaccaggtaagccgggggtt 
aaaaatcctgatacaggcgaagtagttactccacctgtggatgatgtgacaaaatatggt 
ccagttgatggagatccgatcacgtcaacggaagaaattccattcgacaagaaacgtgaa 
5 ttcaatcctgatttaaaaccaggtgaagagcgtgttaaacaaaaaggtgaaccaggaaca 
aaaacaattacaacaccaacaactaagaacccattaacaggggaaaaagttggcgaaggt 
gaaccaacagaaaaaataacaaaacaaccagtagatgaaatcacagaatatggtggcgaa 
gaaatcaagccaggccataaggatgaatttgatccaaatgcaccgaaaggtagccaagag 
gacgttccaggtaaaccaggagttaaaaaccctgatacaggcgaagtagtcacaccacca 
10 gtggatgatgtgacaaaatatggtccagttgatggagatccgatcacgtcaacggaagaa 
attccattcgacaagaaacgtgaattcaatcctgatttaaaaccaggtaaagagcgcgtt 
aaacagaaaggtgaaccaggaacaaaaacaattacaacaccaacaactaagaacccatta 
acaggggaaaaagttggcgaaggtgaaccaacagaaaaagtaaca 

15 Sequence 570 

VDDVTKYGPVDGDPITSTEEI PFDKKREFNPDLKPGEERVKQKGEPGTKTITTPTTKNPL 
TGEKVGEGEPTEKITKQPVDEITEYGGEEIKPGHKDEFDPNAPKGSQEDVPGKPGVKNPD 
TGEVVTPPVDDVTKYGPVDGDPITSTEEIPFDKKREFNPDLKPGKERVKQKGEPGTKTIT 
TPTTKNPLTGEKVGEGEPTEKVTKQPVDEITEYGGEEIKPGHKDEFDPNAPKGSQEDVPG 

20 KPGVKNPDTGEVVTPPVDDVTKYGPVDGDPITSTEEIPFDKKREFDPNLAPGTEKVVQKG 
EPGTKTITTPTTKNPLTGEKVGEGEPTEKVTKQPVDEIVHYGGEEIKPGHKDEFDPNAPK 
GSQEDVPGKPGVKNPDTGEVVTPPVDDVTKYGPVDGDPITSTEEIPFDKKREFDPNLAPG 
TEKVVQKGEPGTKTITTPTTKNPLTGEKVGEGEPTEKVTKQPVDEIVHYGGEEIKPGHKD 
EFDPNAPKGSQTTQPGKPGVKNPDTGEVVTPPVDDVTKYGPVDGDPITSTEEIPFDKKRE 

25 FNPDLKPGEERVKQKGEPGTKTITTPTTKNPLTGEKVGEGEPTEKITKQPVDEITEYGGE 
EIKPGHKDEFDPNAPKGSQEDVPGKPGVKNPDTGEVVTPPVDDVTKYGPVDGDPITSTEE 
IPFDKKREFNPDLKPGKERVKQKGEPGTKTITTPTTKNPLTGEKVGEGEPTEKVT 

Sequence 571 
30 Contig_04 91_pos_34 23_3109, 

putative peptide of unknown function 

gtgatttcatctactggttgttttgttattttttctgttggttcaccttcgccaactttt 
tcccctgttaatgggttcttagttgttggtgttgtaattgtttttgttcctggttcacct 
ttttgtttaacacgctcttcacctggttttaaatcaggattgaattcacgtttcttgtcg 
35 aatggaatttcttccgttgacgtgatcggatctccatcaactggaccatattttgtcaca 
tcatccacaggtggagtaactacttcgcctgtatcaggatttttaacccccggcttacct 
ggttgcgttgtttga 

Sequence 572 

40 VISSTGCFVIFSVGSPSPTFSPVNGFLVVGWIVFVPGSPFCLTRSSPGFKSGLNSRFLS 
NGIS.SVDVIGSPSTGPYFVTSSTGGVTTSPVSGFLTPGLPGCW* 

Sequence 57 3 

Contig_04 91_pos_227 1_13 51 , 

45 putative peptide of unknown function 

gtgatttcatctactggttgttttgttactttttctgttggttcaccttcgccaactttt 
tcccctgttaatgggttcttagttgttggtgttgtaattgtttttgttcctggttcacct 
ttctgtttaacgcgctctttacctggttttaaatcaggattgaattcacgtttcttgtcg 
aatggaatttcttccgttgacgtgatcggatctccatcaactggaccatattttgtcaca 

50 tcatccactggtggtgtgactacttcgcctgtatcagggtttttaactcctggtttacct 
ggaacgtcctcttggctacctttcggtgcatttggatcaaattcatccttatggcctggc 
ttgatttcttcgccaccatattctgtgatttcatctactggttgttttgttattttttct 
gttggttcaccttcgccaactttttcccctgttaatgggttcttagttgttggtgttgta 
attgtttttgttcctggttcacctttttgtttaacacgctcttcacctggttttaaatca 

55 ggattgaattcacgtttcttgtcgaatggaatttcttccgttgacgtgatcggatctcca 
tcaactggaccatattttgtcacatcatccacaggtggagtaactacttcgcctgtatca 
ggatttttaaccccccggcttacctggttgcgttgtttgactacctttcggtgcatttgg 
atcaaattcatccttatggcctggcttgatttcttcgccaccataatgaacgatttcatc 
cactggttgttttgttattttttctgttggttcaccttcgccaactttttctcctgtatt 
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aggattgacataagttggtgttgttgttgtttcaattcctggttcacctttttggactac 
tttttctgtacctggggctaa 

Sequence 574 

5 VISSTGCFVTFSVGSPSPTFSPVNGFLVVGVVIVFVPGSPFCLTRSLPGFKSGLNSRFLS 
NGISSVDVIGSPSTGPYFVTSSTGGVTTSPVSGFLTPGLPGTSSWLPFGAFGSNSSLWPG 
LISSPPYSVISSTGCFVIFSVGSPSPTFSPVNGFLVVGVVIVFVPGSPFCLTRSSPGFKS 
GLNSRFLSNGISSVDVIGSPSTGPYFVTSSTGGVTTSPVSGFLTPRLTWLRCLTTFRCIW 
IKFILMAWLDFFATIMNDFIHWLFCYFFCWFTFANFFSCIRIDISWCCCCFNSWFTFLDY 
10 FFCTWG* 

Sequence 575 

Contig_04 91j?os_518_78, 

putative peptide of unknown function 

15 atgacgtacttgcgctattggagtcacttacgctcgttgatgttgacgcactttctgaag 
tcgatgtactcgttgatcctgaaagactcgtacttgtcgagccacttaatgacgtacttg 
tactgtttgattcactcgt tgatgcagatgcgctatcagacatcgacgtactcgctgatt 
ctgataacttcttacttgtactcgctgattcactctcactcgttgatgtggatgcacttt 
ctgatgtcgacgtgcttgttgaatctgaaacgcttgtgcttgtcgactcacttaaagatg 

20 tgcttgcactgtttgagtcgctcacacttgttgacgttgacgcactgtctgatgtcgatg 
tactcgttgaatccgaaatgcttgtacttgtcgagtcacttaaggacgtacttgcactgt 
ttgagtagcttacactcatag 

Sequence 576 

25 MTYLRYWSHLRSLMLTHFLKSMYSLILKDSYLSSHLMTYLYCLIHSLMQMRYQTSTYSLI 
LITSYLYSLIHSHSLMWMHFLMSTCLLNLKRLCLSTHLKMCLHCLSRSHLLTLTHCLMSM 
YSLNPKCLYLSSHLRTYLHCLSSLHS* 

Sequence 577 
30 Contig_04 93_pos_2737_3663, 

is similar to (with p-value 0.0e+00) 

>gp : gp I Z99108 I BSUB0005_72 Bacillus subtilis complete genome 
(section 5 of 21): from 802821 to 1011250. NID: g2633055. >g 
p:gp| D785091D78509_8 Bacillus subtilis YfjG-YfjR genes, comp 

35 lete cds . NID: g2780390. 

atggaagacgtgacagatattgtctttcggcatgttgtcagtgaagctgcgagaccagat 
gtattttttactgaatttaccaatactgagagttactgtcaccctgaaggtattcatagt 
gtgcgcggacgcttaacttttagtgacgacgaacaaccaatggtagcgcacatctggggc 
gataaaccagaacaattccgagaaatgagtatcggcttagcggatatgggttttaaaggt 

40 atagatttaaatatgggttgccctgtcgcaaacgttgcgaaaaaaggtaaaggatccggc 
ttaattctacgacctgaaacggcagccgaaatcattcaagcttctaaagcaggtggtcta 
ccggtcagtgtaaaaacacgtttaggttattacgatatcgatgaatggcgagactggtta 
aaacacgtcttcgaacaagatatcgcaaatttatccattcatctacgtacccgtaaagag 
atgagtaaagtagatgcacactgggaattaatcgaagcaatcaagacattacgtgatgaa 

45 attgcgccaaatacactattaactatcaatggtgatatccccgatagacaaactggtcta 
gaactcgcaaataaatatggtattgatggcattatgattggtagagggatcttccataac 
ccattcgcatttgaaaaggaaccacgcgaacattcaagcaaagaattattaggtttatta 
cgcttacatctctctttatttgaaaaatatgataaagatgaagcccgacacttcaaaagt 
ttacgcagattcttcaaaatctacgtacgcggcattagaggcgctagcgaactccgccat 

50 caattaatgaacacccaatccattgccgaagcaagagaactactcgatacttttgaagca 
cgtatggatgcacgttcagaagtataa 

Sequence 578 

MEDVTDI VFRHWSEAARPDVFFTEFTNTESYCHPEGIHSVRGRLTFSDDEQPMVAHIWG 
55 DKPEQFREMSIGLADMGFKGIDLNMGCPVANVAKKGKGSGLILRPETAAEI IQASKAGGL 
PVSVKTRLGYYDIDEWRDWLKHVFEQDIANLSIHLRTRKEMSKVDAHWELIEAIKTLRDE 
IAPNTLLTINGDIPDRQTGLELANKYGIOGIMIGRGIFHNPFAFEKEPREHSSKELLGLL 
RLHLSLFEKYDKDEARHFKSLRRFFKI YVRGIRGASELRHQLMNTQSI AEARELLDTFEA 
RMDARSEV* 
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Sequence 579 

Con t i g_0 4 9 3_pos_5 6 4 7_ 6 0 5 7 , 
putative peptide of unknown function 
5 atgtcaaaaaagaaaatcttgatctttattagtgtcatattaatcatttttgggggcttt 
tatctcaaaatgaaatataacgaaaaagaaaaacagaaagaaatctactacaaagagcaa 
caagaacgtatcacgctttatcttaaatacaacactaaagaacctaatatcatcaaatct 
gtccatttcacaagtttaaaacagggaccaatgggtgacgctgttattgaaggctatatc 
aataacaataaaaaagatgattttgttgcatttgcatcacctgaaaacaattatcaattt 
10 ggaggcagacttatagcagacgttaaaatatttaaattacttaaaccggctaatgaatct 
aaatcacccgatgaaatcaaaaaagatttagacaaaaagaaagaacactaa 

Sequence 580 

MSKKKILIFISVILIIFGGFYLKMKYNEKEKQKEI YYKEQQERITLYLKYNTKEPNIIKS 
15 VHFTSLKQGPMGDAVIEGYINNNKKDDFVAFASPENNYQFGGRLIADVKIFKLLKPANES 
KSPDEIKKDLDKKKEH* 

Sequence 581 

Contig_04 93_pos_814 5_87 23, 

20 putative peptide of unknown function 

atgctaggatttgcagggggattgggatacagtcattataaagattcaaaatcgaacact 
gatgtagcttcaaaagagactcagacttccaataaaaacactcatgaagatacaacttca 
caaggtaaaatgcaaaatcaagttaatagccaaacaaacgaagtatcaaatgggacatca 
actaaaacacttagtgaaaaagcaaagcagttaagagaagcttttaacgtcaatgatgag 

25 gaagctcaaattttagcagatgaaatcgatagagcagatgtaaataaagatggcacgatt 
acaacggatgaaatgacgcctactttagatcgttttacaaaagaagggaaattccaacca 
tctgctggtggtacaactagcgaaacacctcaccctaaatatacagcagaagatgctaga 
catatgtctgatgatgaatttctagacgcgtatacagaaggcatgtcagatgatgaagct 
gctactattcacgaaagtgctcaagaatctaacgagtatatgaaatttttaagaggacaa 

30 gttgaagcacgtgcaaaaggacagggcggtaattattaa 

Sequence 582 

MLGFAGGLGYSHYKDSKSNTDVASKETQTSNKNTHEDTTSQGKMQNQVNSQTNEVSNGTS 
TKTLSEKAKQLREAFNVNDEEAQILADEIDRADVNKDGTITTDEMTPTLDRFTKEGKFQP 
35 SAGGTTSETPHPKYTAEDARHMSDDEFLDAYTEGMSDDEAATIHESAQESNEYMKFLRGQ 
VEARAKGQGGNY* 

Sequence 583 

Contig_04 93_pos_12l91_11406, 

40 putative peptide of unknown function 

atgcaacattcaagcaaaataatagtatttgtaagtttcttaattttaacgatttttatt 
ggaggatgtggttttataaataaagaagatagcaaagaaacggaaatcaaacaaaacttt 
aataaaatgttagacgtgtatccaactaaaaatctagaagacttttatgataaagagggc 
tatcgtgatgaagagtttgataaagatgacaaaggaacatggattattaggtctgaaatg 

45 acaaaacagccaaaaggtaaaattatgacctcaagaggtatggttctctatatcaatcgc 
aacactagaacagccaaagggtattttt taatagataagataaaagatgatagtaatggt 
agaccgatagagaatgaaaagaaataccctgtaaaaatgaaccataataagatctttcca 
acaaagccaatatctgatgataagttaaaaaaagaaattgaaaacttcaaattttttgtg 
caatatggaaattttaaaaacttaaaggattataaaaacggggatattttatacaatcct 

50 aatgttcctagttattctgcgaaatatcaattgagtaataatgaatataacgtacaacaa 
ttaagaaaaagatatgacatcccaactaaaaaagcacctaaactattgttaaaaggggat 
ggcgacttaaaaggatcatccgtaggtcatagagacctagaatttacctttgtagagaat 
aagaaagaaaacatcttttttacggatagtattaattttaaaccgactgagcgtgatgaa 
tcatga 

55 

Sequence 584 

MQHSSKIIVFVSFLILTI FIGGCGFINKEDSKETEIKQNFNKMLDVYPTKNLEDFYDKEG 
YRDEE FDKDDKGTWI I RSEMTKQPKGKIMTSRGMVL YI NRNTRTAKGYFLI DKI KDDSNG 
RPIENEKKYPVKMNHNKIFPTKPISDDKLKKEIENFKFFVQYGNFKNLKDYKNGDILYNP 
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NVPSYSAKYQLSNNEYNVQQLRKRYDIPTKKAPKLLLKGDGDLKGSSVGHRDLEFTFVEN 
KKENIFFTDSINFKPTERDES* 

Sequence 585 
5 Cont ig_04 9 3_pos_l 1 1 3 4_10 1 66 , 

is similar to (with p-value 2.0e-20) 

>gp:gp|AJ222587 | BS1 6829KB__25 Bacillus subtilis 29kB DNA frag 
ment from ykwC gene to csel5 gene. NID: g2632216. >gp:gp|Z99 
111 I BSUB0008_93 Bacillus subtilis complete genome (section 8 

10 of 21): from 1394791 to 1603020. NID: g2633699. 

atgacttactgctcattatctataaagatttatacattgaaaatattaattgttttaaca 
ataggaggtcatgttgttattatgagtcagtttaaggacacattatataaactatttgag 
ccaatgatgaaaatagagttctatcaaaatcttttggttaatcttttaattatacttgct 
tatatcttgatgggtatgattgtaattgcgatatcaagaaagttagttactaaatttttc 

15 aacgttaatgaaaagaaaaagaaccgtcataaaattaagagaagtgaaacactatccaca 
ttgattcaaaatttaataagttatgtcgtatggtttattgtccttacgtcaatactttca 
cgtttcggtattagtgtatcagcaattttagcaggagctggagttgttggtgttgccgtt 
ggtttcggagcacaaacaattgtaaaagacattattactggtttctttatcatatttgaa 
ggacagtttgatgtgagtgattatgttcaaattaatgcatctggggtaacaattgctgaa 

20 ggtacggttaaaacgattggtttaagatcaacgcgtatacaatcagatactggagaaatt 
tatacattacctaatggtatgattagtgaaatagttaattattctgctacagatgtttca 
cctattgtgatgataccgatttctccaaatgagaattataaagtgatagaagagaaatta 
ttaacatttttacctacattaaagaataaatatgacatatttgtatccgcaccagattta 
cttggtttagatagtgttgatggcaatgaaatggtgattaaacttttagcacatgttaag 

25 cctggaatgcattttccaggacaacgtttacttcgtaaagaggtcatacaatactttagt 
gaagaaggcattcatattccgaaaccaacacttgtaaaacttgataaagaattgaataaa 
aaagaatag 

Sequence 586 

30 MTYCSLSIKI YTLKILIVLTIGGHVVIMSQFKDTLYKLFEPMMKIEFYQNLLVNLLIILA 
YILMGMIVIAISRKLVTKFFNVNEKKKNRHKIKRSETLSTLIQNLISYVVWFIVLTSILS 
R FG I S VS AI LAGAGVVGVAVG FGAQT I VKDI I TG FFI I FEGQ FDVS DYVQIN ASGVT I AE 
GTVKTIGLRSTRIQSDTGEI YTLPNGMISEIVNYSATDVSPIVMIPISPNENYKVIEEKL 
LTFLPTLECNKYDIFVSAPDLLGLDSVDGNEMVIKLLAHVKPGMHFPGQRLLRKEVIQYFS 

35 EEG I HI PKPTLVKLDKELNKKE* 

Sequence 587 

Contig_04 93_pos_9804_898 6, 

putative peptide of unknown function 

40 atgaagatgacaaaacgcttaatattattcgactttgatgaaacttactacaaacatcat 
acgcatcaagcgaatatgccttatttaagagaaatggaaggtttattacagaatataact 
actaaaaacaatgtcattacggctattttaacaggaagtactatagaaagcgtacttcaa 
aaaatgagtaacgttggtatgtcatataaacctcaacatattttttcagatttaagttct 
aaaatgtttacatggaataactgtgaatatattgaatctgatgaatataaaaacgaagtg 

45 ttgacagaacgtttcttattggaagatatattagatatattaaaacatgtttcttctaaa 
cataaagtagcgtttataccacaaagaacttttcgagacaatgaaacattgtacaatttc 
tatctctattcttcgggtgacacgcatttagataaaacaattttagaagacctcactcag 
tattctaagataagggactatacgatgacatttaatcgttgtaatcctttagcaggtgat 
cctgaaaatgcttatgatattaattttactccaagaaatgcaggaaaattatatgccaca 

50 aaatttttgatgaataaatatggtgttccaaaagaattgattattggctttggtgatagt 
ggtaatgatgaagcgtttttaagttatttagatcacgcaatgattatgtctaacagtcaa 
gatgaggaaatgaagcgtaaatttaaaaatacaaaatatccttattacaaaggtatttat 
acacatgtacgcgaatttatagaatctgataatgtttaa 

55 Sequence 588 

MKMTKRLILFDFDETYYKHHTHQANMPYLREMEGLLQNITTKNNVITAILTGSTIESVLQ 
KMSNVGMSYKPQHIFSDLSSKMFTWNNCEYIESDEYKNEVLTERFLLEDILDILKHVSSK 
HKVAFI PQRTFRDNETLYN FYLYSSGDTHLDKTI LEDLTQYSKI RDYTMTFNRCNPLAGD 
PENAYDINFTPRNAGKLYATKFLMNKYGVPKELIIGFGDSGNDEAFLSYLDHAMIMSNSQ 
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DEEMKRKFKNTKYPYYKGI YTHVREFIESDNV* 



Sequence 589 

Con t ig_0 4 93_pos_7 7 1 3_6 802, 
5 putative peptide of unknown function 

atgaaaaaacatgtgttagcaatcggagcaattacagtaacaacattacttgcaggatgc 
gattttggagatttggtaggacagcatcagactgataagcaatcagaaaatagtaacact 
caaaccgagcaagcttcaaataataaaaattcaaattctaataatggtcattcaagtaat 
aacaataagtctagagatagagtcgaggatttaactcaatcgcaaaaagtagcattagct 

10 atcaatgatccctcagtttctcaatatgccgttaacgcaagtgaattaagaaatcattca 
ttttatgcaaactataacggtgggggccaacgtaaaagtattcatacgtatcagttggaa 
gcattaccaacaaaagtagaaggtgcacctagtgatatgaaattctatactgcaaaacca 
tctaaaggatcatttgtgacgcttatcggtattggtaatgaaaaggtattaattgcaggc 
acacaaagttcaggaacatatcaacaatatgcgcattcggaagcagcaagagaattagat 

15 ttacatgaactattaaataaatacggtaaaagttcaaattataaagatatagcaaatcaa 
attgcgtttacacaaagtcaatcatcaaattcaagtgaatctaacacgtcagatgaaggg 
acgtctaacagtgatagtgatacatctaatgatgataaagttacacgtagtaatgtgata 
gataaagttgaagcgtatgaaggtcacccattagatactgatacatatacatttaaagaa 
cccgaacagaacgaagacggagattggggcttttctattttagataaggaaggcaacctc 

20 gaaggttcttatattgtgacatctgatggtgaagttacaaaatacgatgaaaacggggaa 
gaaatagagtaa 

Sequence 590 

MKKHVLAIGAITVTTLLAGCDFGDLVGQHQTDKQSENSNTQTEQASNNKNSNSNNGHSSN 
25 NNKSRDRVEDLTQSQKVALAINDPSVSQYAVNASELRNHSFYANYNGGGQRKSIHTYQLE 
ALPTKVEGAPSDMKFYTAKPSKGSFVTLIGIGNEKVLIAGTQSSGTYQQYAHSEAARELD 
LHELLNKYGKSSNYKDIANQIAFTQSQSSNSSESNTSDEGTSNSDSDTSNDDKVTRSNVI 
DKVEAYEGHPLDTDTYTFKEPEQNEDGDWGFSILDKEGNLEGSYIVTSDGEVTKYDENGE 
EIE* 

30 

Sequence 591 

Cont ig_0 4 94_pos_7 7 8 5_8 111, 

putative peptide of unknown function 

gtgctagacacatctttagctttatcaagtgcttttgtacctatgtcttttgctttgtca 
35 aacgtagtaccacctatgtctcttgctttaccaccaatgtcttttactttattaacatca 
ttttttacaccagagctaataacatctaatataccgtctttctttttagtacctttacta 
aatttaggtagttttttcttcttagtgtcatatcctgaattagataaaatagcatgtgtt 
tgcgcgccactatatacacttgagcctttaggtaagaatgttgttgtatctttgttaggt 
gttagtgccattcttccgttaggataa 

40 

Sequence 592 

VLDTSLALSSAFVPMS FALSNVVPPMSLALPPMSFTLLTSFFTPELITSNIPSFFLVPLL 
NLGSFFFLVSYPELDKIACVCAPLYTLEPLGKNVVVSLLGVSAILPLG* 

45 Sequence 593 

Contig_04 94_pos_8613_9062, 

putative peptide of unknown function 

atgaaactgccaactgctttaaatatgccaattgtaccttttttgagtgcattccatgta 
tttttaactccagaccataatgcttttgctttattcactactgatttttttatagccgtc 

50 cagattttaactgcggcattctttacagcattaaatattacaacaataccttttt ttaat 
gcgttaaatacagatagaacacctttgcgcaaagctcgaacgattcctaacacaccattt 
tttaatgcagtccaaactttaatagagaaactcttaatagcattaaatatcgtaactact 
atgcgcttaataaggttgatat taaattttacttgcgcaacatatgctttaataattgct 
ataacaccgttctttaaggcggtccagattttaatagcagcatttttcatgccattccat 

55 aaagctgataagacattttttaatgcttga 

Sequence 594 

MKLPTALNMPIVPFLSAFHVFLTPDHNAFALFTTDFFIAVQILTAAFFTALNITTIPFFN 
ALNTDRTPLRKARTIPNTPFFNAVQTLIEKLLIALNIVTTMRLIRLILNFTCATYALIIA 
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ITPFFKAVQILIAAFFMPFHKADKTFFNA* 
Sequence 595 

Con t i g_0 4 9 4_pos_l 5 3 3 6_1 5 9 8 3 , 
5 putative peptide of unknown function 

gtgtctacttcccaattgattgtttcgaattccggacgagctaactcagggtttttctct 
aattcagcaacagtgttgaatttagcgttagcacgttttaagatcgggtatttcccactt 
gcagttgatactgaagttttttgtaccaattctgataagtcttggactgtcttaacttct 
ttttcaggaatatatttaatatcctctgggatagttacgccaacgtcatcagatttaaca 

10 ttgtcacgtttagccccttttgatttcatgtactgttcaaatgctagaatt tcttcgttt 
gtctctggattttggtttaatttagccatagaacgtttcgctccttcttttttgtctttt 
tctttttttaattcttcttctgttggttcttctactttttcaatagtaggtgtttctggt 
gtttcttcaggtttgtcatctggttttggtgcatcatcaggtttttcttcatctgaagtt 
ccttctggttcatcatcagaaggtttgttctctgattcttctccagaattaccatctttg 

15 ttatcttcaacttctgcaccttcatctttaggtggttcatcttgtttaggtgctgacgct 
tcaatttcttttgaaagctgttcgagttcttcgtactctttcttttga 

Sequence 596 

VSTSQLIVSNSGRANSGFFSNSATVLNLALARFKIGYFPLAVDTEVFCTNSDKSWTVLTS 
20 FSGI YLISSGIVTPTSSDLTLSRLAPFDFMYCSNARISSFVSGFWFNLAIERFAPSFLSF 
SFFNSSSVGSSTFSIVGVSGVSSGLSSGFGASSGFSSSEVPSGSSSEGLFSDSSPELPSL 
LSSTSAPSSLGGSSCLGADASISFESCSSSSYSFF* 

Sequence 597 
25 Contig_0494_pos_164 81_1614 0, 

putative peptide of unknown function 

gtggatgaaaaagggctatactttaaatgccacttacctaatacatcatacgcaagagat 
atttatgagaatattaaagcaggcaacgttaatcagtgcagtttcttttacacattgcca 
cctaatgactcaacggctcgtacgtggcaaaacatagataatgagtacgttcaaaccat a 
30 aataaaatcgatgaattgattgaggttagtattgttacagtgccagcctacaaagataca 
tcggttgaagtcggtcaacgtgcgaaagacttaaagaaattcaaacagttggaacaaatg 
aagatagcattggatttagaaagcctacgttttgaaacgtaa 

Sequence 598 

35 VDEKGLYFKCHLPNTSYARDIYENIKAGNVNQCSFFYTLPPNDSTARTWQNIDNEYVQTI 
NKIDELIEVSIVTVPAYKDTSVEVGQRAKDLKKFKQLEQMKIALDLESLRFET* 

Sequence 599 

Con t i g_0 4 9 4_pos_l 3 6 1 8_1 32 7 1 , 

40 putative peptide of unknown function 

atgaatcatgttcaaaagaacaatattaaattctttgattatccaaacgcacaagaaatt 
agagatgtagtgattgtcatagatccattaggccaagacacccctataacttacggagat 
gatttccctatagcctttgaagatttgtatcaaatagacgtgtttgtgaagcaacaaaaa 
aacatcaatggaagattaacagctaaaaaaataacttttgagatagctaaggttttacga 

45 acgataaatgtatatgacactggcggagctataaagcctgaatatataaaagattttaat 
atttacagacaaattaaaagatttgaagtaaagcaatcactcgtataa 

Sequence 600 

MNHVQKNNIKFFDYPNAQEIRDVVIVIDPLGQDTPITYGDDFPIAFEDLYQIDVFVKQQK 
50 NINGRLTAKKITFEIAKVLRTINVYDTGGAIKPEYIKDFNI YRQIKRFEVKQSLV* 

Sequence 601 

Con t ig_0 4 9 4_pos_6 0 5 1_5 587, 
putative peptide of unknown function 
55 gtgacttattctgtctacaaaggatatgcagaatcattaaaagatacttctgaatttagt 
tggactgatgaaagttggcaatttgaacaaggtgttataggaagtgatgaagttaaatat 
aaacacaatattcgttactttaaaatatttaacggttctaaagatactattaacccttta 
ttaagacacaaattaaatattaattgcacacttacagcaccttatggatttgaaatcgtt 
aatctaaccacaaatgatatatttgaatataaaaaaccgctcaaaaagcgtaatacggtt 
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tctattataggagtgcatccttatattaataataaaagagttggtaaagacacaaattat 
gattttattactttagcgccgggttggaatgaaattttaattagaggtcacaatatatcc 
aatagtcctaaaacagaatttatatttaattacatctataggtag 

5 Sequence 602 

VTYSVYKGYAESLKDTSEFSWTDESWQFEQGVIGSDEVKYKHNIRYFKIFNGSKDTINPL 
LRHKLNINCTLTAPYGFEIVNLTTNDIFEYKKPLKKRNTVSI IGVHPYINNKRVGKDTNY 
DFITLAPGWNEILIRGHNISNSPKTEFIFNYIYR* 

10 Sequence 603 

Contig_0494_pos_5586_4 018, 

putative peptide of unknown function 

gtgagaatattggaaaatctaatatttatgaatagagaagggacattttcggaaattgtt 
aatgactttgattttggttcctttaaatatgaatatgaacaaaataatgagcgatccata 

15 tctctcactgcttataaaactaatgttaacgcggatatatttgatagtttgattaatgaa 
aattatttagtttggaagggccagaaatatgtcattaaatcgactgagcttaagtatgaa 
gaaggt gtaa t act taatgaaattgaggctaagca tat ttctatggaatttcaaaa teat 
tatatacctaaagatttagatgatgagtcactgaatgatgaagatgagactgaagcaaaa 
atttccatgaaagttaaagagtaccttgattttgcattcaaaaataataaacttaatttc 

20 gattataagttacatggaaaatttaatgagagtaaatatattgaacagttaggagataaa 
aatggtttagaacatcttattgaaggtgctgagcattttggctatatattttttgctgat 
aataaaactttccatatctatacacctgataatttttataaaaaatcagatgaaatatta 
gtttataaatataataatagttcggtttcggctaaaacaatcacaactgaattacgcacc 
tacattcaaggatatggaaagaaaaagtcgaaatccgaaacgaaaaactataaacctata 

25 aaacctaaagatttctcatactctggaaattttaataaagaagggacttggtctactgaa 
catataggagattcgttttataagacatttgattgtaagtgggggaatgaaaccttaact 
tggaatctaaaaaaaggacctaaaggtggaattatcgaagtatttattgatgataagtcg 
aaagggacttttgattgttacagcgctcatgcttcgacgcaaaaagtgattttagctaaa 
ggattatcaaaaggtaaacattcttttagaggagtttttaaatcgaaaaaacctggtatt 

30 gattataagaagtctaatccagtcatgtatgttggtacgagtaaaagtagtgttttaaat 
ctaactgcagttcttaaaggtaaagatatttatcatgtatatgctgaatataagtctcca 
tattataagcaatatggtaaatcagaagcccctacaatatatgatgataatattacaagt 
caatcagagttaaagaagaaattaaaagaaacacttgatgacataccaacaatcgaagta 
gcaacgaattatttaggattagaaagtattcatgaaaataatactattcgatttatacac 

35 aaacctatcggatttaatactgatttaaaagttgtcaaacttactgaatatcaccccctt 
gtttcgcagcctattgaagtggaattcagtaatgcacagaaagatattataaaaatgcaa 
tcacagttcaatcgtaggttaagaaaggttaataatcttatgaaaaaaggattcaaaact 
agtgactattctttaaatgtgttagaggaatataacgaaacagtaggaagtgtattgatt 
gatgagtaa 

40 

Sequence 604 

VRILENLIFMNREGTFSEIVNDFDFGSFKYEYEQNNERSISLTAYKTNVNADIFDSLINE 
NYLVWKGQKYVIKSTELKYEEGVILNEIEAKHISMEFQNHYIPKDLDDESIiNDEDETEAK 
ISMKVKEYLDFAFKNNKLNFDYKLHGKFNESKYIEQLGDKNGLEHLIEGAEHFGYIFFAD 

45 NKTFH I YTPDNFYKKSDE I LVYKYNNSS VSAKTI TTELRT YIQGYGKKKSKSETKN YKPI 
KPKDFSYSGNFNKEGTWSTEHIGDSFYKTFDCKWGNETLTWNLKKGPKGGI IEVFIDDKS 
KGTFDCYSAHASTQKVILAKGLSKGKHSFRGVFKSKKPGIDYKKSNPVMYVGTSKSSVLN 
LTAVLKGKDI YHVYAEYKS PYYKQYGKSEAPTI YDDNI TSQSELKKKLKETLDDI PTI EV 
ATNYLGLESIHENNTIRFIHKPIGFNTDLKVVKLTEYHPLVSQPIEVEFSNAQKDI IKMQ 

50 SQFNRRLRKVNNLMKKGFKTSDYSLNVLEEYNETVGSVLIDE* 

Sequence 605 

Contig_04 94_pos_3836_1974, 
putative peptide of unknown function 
55 atgttattaactttagactttcctattcaaataggacacacatttagaaccaagatgata 
aataattttagaacaatacttaattattataatgaattagatcatcagcatcgcgcacac 
acagaaactaagcatcatgcacatcaagccatgcaggttgattatagaaatacaaacgtt 
tctgcatttttagattatcttaacggtaatattaatgggcttgttttaggagcaaatgga 
gacggtatagctgaaacaaaacaagccagagtatcaatagatggtaccgtacatcccttg 
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ttgcaagaaaggctgcttcatgactttttaggaattaacagaaaattagataaagaaata 
cattctaatggtgcagttgattttatttggaatcctccatatataccaggaaatagattg 
ggagaaaatgggacaccaaataattgggaaccagaagcccatattgaagcgtttttaaac 
cctttagttgataatcaatacgttacaaaagaagttataggagaagatacatcaggaaaa 
5 tataatgtgtacaaatttacgtttgaaccacaaaattacaataaaacgttacttattact 
teat gtatacacggtaatgaaactactggattttttgatatgtgcca tat act caat eta 
ttggtcaatcaatgggaaaagtatcctcaattaacttacttaagaaaaaatgtacgttta 
atttatgttcctatggttaacccgtggggattcgcaaatcaagaaagagagaatgtgaac 
aatgtagatttaaacagaaattttgattataactggaaggcaggtaaagggacagatcct 

10 gataaatctaacttcaaaggtaaaagtcctttttctgaaaaagaatcacaaaatatgcgt 
agcttagttcaaagtatagataatttaactgctcacttagatttgcatgatattatttca 
gtaaataatgattactgtttattttatccgcgttgggccaatcaaaaaaataataatatg 
actcatcttattaacaatttaaaaagtaacggagacctcgttgtttggggttccagtaca 
ttatcatcttttagtaattgggtaggaatccgaaataaaacaacgtcatatctttcagaa 

15 ataaatgaaaaacgtgtcggtgaaaagaaaagtcccgaagaaatgagacgttcagtacgc 
tgggtaggtaatgtaatttttagaatggcacaatttgaatcttatcaaaatggtcaaaca 
tcattagatcctttcattaaagtgatggtatatgatgatagatttaacaataaaacatct 
gaagtcattaccctacgtgcagaaaggaatgaatggcaacgtataatgatgagtcagcag 
cgtttcaaagttttagcaaatggatttgtagagctctatggatatgtgactataaacgtt 

20 gatagagatgtcacagtggggattaatcctaatattgttcagaattatcatccattcttt 
ggatttaataaaagtagaaaacgtaatttattttcaattgaacatagactcaacaaagga 
aatacaactttccctatttacgctgctgctggagttcaaatgtcgacgattactgaacca 
ggtacaaaacgtactgatacagtaatgccggtactagatgttaagaaaaaaggtgctggt 
attgtaacaatcaaacaaattaaattatttgcgaagttcactcctacgcattctgctaat 

25 tccattcagatattaaaatctggagaatacggtaatcttaaagaagatacgttcacacaa 
atttaccctaatactatatatgatgatgatttaagaaatgttataaatggggaggaaaaa 
taa 

Sequence 606 

30 MLLTLDFPIQIGHTFRTKMINNFRTILNYYNELDHQHRAHTETKHHAHQAMQVDYRNTNV 
SAFLDYLNGNINGLVLGANG DGIAETKQARVSIfX^TVHPLLQERLLHDFLGINRKLDKEI 
HSNGAVDFIWNPPYIPGNRLGENGTPNNWEPEAHIEAFLNPLVDNQYVTKEVIGEDTSGK 
YNVYKFTFEPQNYNKTLLITSCIHGNETTGFFDMCHILNLLVNQWEKYPQLTYLRKNVRL 
IYVPMVNPWGFANQERENVNNVDLNRNFDYNWKAGKGTDPDKSNFKGKSPFSEKESQNMR 

35 SLVQSIDNLTAHLDLHDIISVNNDYCLFYPRWANQKNNNMTHLINNLKSNGDLVVWGSST 
LSS FSNWVG I RNKTTS YLSE I NEKRVGEKKS PEEMRRS VRWVGN V I FRMAQFES YQNGQT 
SLDPF1 KVMVYDDRFNNKTSEVITLRAERNEWQRIMMSQQRFKVLANGFVELYGYVTINV 
DRDVTVGINPNIVQNYHPFFGFNKSRKRNLFSIEHRLNKGNTTFPIYAAAGVQMSTITEP 
GTKRTDTVMPVLDVKKKGAGIVTIKQIKLFAKFTPTHSANSIQILKSGEYGNLKEDTFTQ 

40 IYPNTIYDDDLRNVINGEEK* 

Sequence 607 

Contig_04 94_pos_1581_7 60, 

putative peptide of unknown function 

45 gtgggtgacgcaataataaataaaattaatggtgcaactaaaattaaatatatccgtatg 
tttgatgaattaaaaagacaaattaatgcacgagccactgaaatacaagaacaattagat 
aatttagaagattacgttgttaaagtgaaagatgcaagtgatgaaggaattacaaagatt 
cagattgaaacaaaaaaaggattggacaaacttaatcaacagcgtagtaaaagtttaaaa 
gacgtcgaggaatctcttaacgcggctaaaaatacaattcaaaatctttatgaagaatat 

50 gacaacgaaattgacacaaaaggaagtcaatatttaaaagatttaagaatcgaagttagg 
aatattgaaaatatattaagtcaagagggatacgtcacaattgatgaacatcgtaaaagc 
attactgaaatacaagaaaagttacctgaatcttcagactggattgaatatgatttgatt 
aatggagctataaaaaataggcattataaagctgaaggacaaaatggttttaattgcgct 
tataaaatcattcaacatcaagactataaggaagtgatgttaagaattaacgctgacaac 

55 tttaaaagtggaactgttatagcgaagttaccgagtgaactaattacaagtacgcaaact 
gcgttcctaagatcggtgcctgttaaagcttgtggt get caat taactattgaacctaat 
ggagatgttaaagtttatatttctcagagcgatcagtggtcagtaagtcgtgaagct tat 
atttacggagaaattagaatgatagataaaggaggtgaataa 
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Sequence 608 

VGDAIINKINGATKIKYIRMFDELKRQINARATEIQEQLDNLEDYWKVKDASDEGITKI 
QIETKKGLDKLNQQRSKSLKDVEESLNAAKNTIQNLYEEYDNEIDTKGSQYLKDLRIEVR 
NIENILSQEGYVTIDEHRKSITEIQEKLPESSDWIEYDLINGAIKNRHYKAEGQNGFNCA 
5 YKIIQHQDYKEVMLRINADNFKSGTVIAKLPSELITSTQTAFLRSVPVKACGAQLTIEPN 
GDVKVYISQSDQWSVSREAYI YGEI RMIDKGGE* 

Sequence 609 

Con t i g_0 4 94 _pos_7 5 8_ 1 3 2 , 

10 is similar to (with p-value 3.0e-23) 

>pir:pir | S4 1 182 | S4 1182 hypothetical protein 37.1 - phage SPP 
1 >pir :pir IS43808 IS43808 hypothetical protein 38 - phage SPP 
1 >gp : gp | X67865 I BSSPP1_10 B.subtilis phage SPP1 DNA sequence 
coding for products required for replication initiation. NI 

15 D: g472886. 

gtgatggatacttataaatctatgactgaacttgtgaggaatgaaaaagattggatgatt 
gagacacaagatagaaatagtaaatcacttataactgctatacacggaggcggtatagag 
tgtggcacttctgaattagcgttattggttgcagaattatcgaatgcaaactattttact 
tttaaaggtttaaaaccgaaaaacaatagaactctacacgtcacttcaacaaattatgat 

20 aaccccaatttattatattggaatcaatttatgaatgtaacgatagccgtacatggttat 
tcgagcaatcaagcaaatagttatattggtggattggatgaaagacttatatctcttatt 
actcacaatttaaaagtttcaggttttaatgtggaagctgctcctgacagaattgcgggc 
agagaaattaataatataaccaacaaaaatgcctatggcatgggtgtacagattgaaata 
tcgactcaacaaagaaaagaattttttagtcgaaacgattttagtaaaaagaatagagaa 

25 aatacacataattggacagaagatatgtattattatgctaatgctatttgtgctgcactt 
aatgatagaaagtgggtagaaacatga 

Sequence 610 

VMDTYKSMTELVRNEKDWMIETQDRNSKSLITAIHGGGIECGTSELALLVAELSNANYFT 
30 FKGLKPKNNRTLHVTSTNYDN PNLLYWNQFMN VT I AVHG YSSNQANS Y I GGLDERLISLI 
THNLKVSGFNVEAAPDRIAGREINNITNKNAYGMGVQIEISTQQRKEFFSRNDFSKKNRE 
NTHNWTEDMYYYANAICAALNDRKWVET+ 

Sequence 611 
35 Con t i g__0 4 9 5_pos_3 1 4 7_2 6 1 7 , 

putative peptide of unknown function 

atggtgctcgtacaatttccaccttggtttgattgtaacgtccaaaatataaattacatc 
ttatatgtgagaaaacaattaactgatattccgatgagcattgaatttagacatcaatca 
tggtttgacaatcagtataaagaacaaactttatccttcttaacacaacatcaaatcatt 

40 catgcagtggtagatgaacctcaagttaaagaggggagcgttcctttagtaaataggatt 
act agtgaaattgcttttgtacgtta teat ggacgtaatcattatggttggactaaaaaa 
gatatgactgatcaagaatggcgagatgtaagatatttatatgattatagcgatgatgag 
ttagctgacttggctcgtaaagtcgaaatacttaatcaaaaggctaagaaagtatatgta 
atttttaataataactctggcggtcatgcagctaataatgctaaaaagtatcaaaatatt 

45 ttagacattgattatgaaggtttagcaccgcaacaattaaaactattttaa 

Sequence 612 

MVLVQFPPWFDCNVQNINYILYVRKQLTDIPMSIEFRHQSWFDNQYKEQTLSFLTQHQII 
HAVVDEPQVKEGSVPLVNRITSEIAFVRYHGRNHYGWTKKDMTDQEWRDVRYLYDYSDDE 
50 LADLARKVEI LNQKAKKVYVI FNNNSGGHAANNAKKYQN I LDI D YEGLAPQQLKLF* 

Sequence 613 

Cont ig_04 95__pos_22 65_177 7 , 
putative peptide of unknown function 
55 gtgtttatgatatttgtgtcgatattactgatgattcgtcacaaaatcaaaccttttaaa 
atttttgacaaacctaaatatgcgcgtacatatgttgatgctgaagggaaaacataccgt 
tatagtgtaccacccctgtttgcttttataacaacgttatttattgggctattaacagga 
ctgtttggcataggtggaggtgcattgatgacccctcttatgctcatcgtctttagattt 
ccaccacatgttgcagtaggcacaagtatgatgatgattttcttt tcaagtgtgatgagt 
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tcaatagggcacatctttcaaggacatgtggcttggggctattctatcattctcattatt 
tcaagtgttataggtgcacaaataggtgtgagggtcaatcgatctatgaaatccgacaca 
gttgtaatgttattgagaacagtaat get tatcatgggtgtatatttaa teat taaatct 
tttatttaa 

5 

Sequence 614 

VFMIFVSILLMIRHKIKPFKI FDKPKYARTYVDAEGKTYRYSVPPLFAFITTLFIGLLTG 
LFGIGGGALMTPLMLIVFRFPPHVAVGTSMMMIFFSSVMSSIGHIFQGHVAWGYSI ILII 
SSVIGAQIGVRVNRSMKSDTVVMLLRTVMLIMGVYLIIKSFI* 

10 

Sequence 615 

Contig_04 95_pos_1181_372, 

putative peptide of unknown function 

atgagccatgtcggtatcttttttgatgaaaagttatgccaagagattccggaaatagat 
15 gttatctttggtagtcatacgcatcatcattttgaacatggagaaataaacaatggtgtt 
ttgatggcagctgccggaaaatatggctattatttaggtgaagttaatattacgattgaa 
aatggaaaaatcgttgataaaatcgccaaaattcatcctattgaaacacttcccttagtc 
gagacacattttgaagaagaaggaagagcacttctaagtaaaccagtagttaa tea teat 
gtgaacttagtcaaaagaacagatgttgttacaagaacatcgtatttactggctgaaagt 
20 gtatatgagttttcaagggctgattgtgcaatcgtaaatgctggacttatagttaatggc 
attgaagctgataaagtgacggaatatgatatacatcgcatgttaccccatccaatcaat 
attgtaagagttcgattaaccggtaaacaattaaagcaagtgattcaaaaaagccaaaag 
caagaatatatgcacgaacatgcacaaggtcttggttttagaggggatatatttggagga 
tatattttatataatctaggctttattgagtcagaagaccgttat tttataggcgatgaa 
25 gagattcaaaatgataaacaatatacgttaggtactgttgatatgtatacatttggaaga 
tatttcccattgctaaaggggttatctacagattatattatgcctgaatttttacgtgat 
atttttaaagagaaattactaaaattataa 

Sequence 616 

30 MSHVGIFFDEKLCQEIPEIDVIFGSHTHHHFEHGEINNGVLMAAAGKYGYYLGEVNITIE 
NGKI VDKI AKI H PI ETLPLVETHFEEEGRALLSKP WNH HVNLVKRTDVVTRTS YLLAES 
VYEFSRADCAI VNAGLIVNGIEADKVTEYDIHRMLPHPINIVRVRLTGKQLKQVIQKSQK 
QEYMHEHAQGLGFRGDIFGGYILYNLGFIESEDRYFIGDEEIQNDKQYTLGTVDMYTFGR 
YFPLLKGLSTDYIMPEFLRDIFKEKLLKL* 

35 

Sequence 617 

Cont ig_0 4 9 6_pos_0_l 1 67 , 

is similar to (with p-value 0.0e+00) 

>sp:sp|P39772|SYN_BACSU ASPARAGINYL-TRNA SYNTHETASE (EC 6.1. 

40 1.22) (ASPARAGINE--TRNA LIGASE) (ASNRS) . >gp: gp | L4 7709 I BACYP 
IA_24 Bacillus subtilis (clone YAC15-6B) ypiABF genes, qcrAB 
C genes, ypjABCDEFGHI genes, birA gene, panBCD genes, dinG g 
ene, ypmB gene, aspB gene, asnS gene, dnaD gene, nth gene an 
d ypoC gene, complete cds's. NID: gll46223. >gp : gp j Z991 15 | BS 

45 UB0012_176 Bacillus subtilis complete genome (section 12 of 
21): from 2195541 to 2409220. NID: g2634478. 

atgaaaactacgattaaacaagcgaaaaaacatcttaaccaagaagtaacaattggtgct 
tggttaactaataaacgttcaagtggtaaaatagcgtttttacaattacgcgatggtaca 
ggatttatgcaaggagtagtagtaaaatctgaagtagatgaagaaacatttcaactagca 

50 aaagatataactcaagaatcatctttatacatcacaggaacgattacagaagataatcgt 
tctgatttaggctacgaaatgcaagttaaatcaatcgaaattgtacatgaagcacacgat 
tatcctattacaccaaagaatcatggaacagaatttttaatggatcatcgtcacttatgg 
ttacgttcaaaaaaacaacatgctgtcatgaaaataagaaatgaaattatccgtgcaaca 
tatgagtttttcaatgaaaatggcttcactaaaattgatccacctattttaacagcaagt 

55 gcaccagagggaacaagtgagttattccatacaaaatatttcgatgaagatgcattctta 
tcacaaagtgggcagttgtatatggaagcagccgcaatggctcacggacgtgttttttca 
tttggcccaacttttcgtgcagaaaaatctaaaacacgccgtcatttaattgaattctgg 
atgattgaaccagaaatggcctttacaaatcatgcagaaagcttagaaatacaagaacag 
tatgtgtctcacattgttcaatctgttttaaatcattgccaattagaactcaaagcttta 
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gatagagatacaactaaactagaaaaagttgctacacctttccctagaatttcttatgat 
gatgctatcgaattcttgaaaaaagagggattcgatgatattgaatggggtgaagacttt 
ggtgcacctcatgaaacagccatcgctaatcactatgatttaccagtattcattacaaat 
tatccaactaaaattaaaccattctatatgcaaccaaatccagacaatgaagatacagta 
5 ttatgtgctgatttaattgcgcctgaaggttacggtgaaattattggtggttccgaacgt 
attaatgatttagaattattagaacaacgcattaatgagcacgaattggatgaggaaagt 
tatagctattatttagatttacgtCTT 

Sequence 618 

10 MKTTIKQAKKHLNQEVTIGAWLTNKRSSGKIAFLQLRDGTGFMQGVVVKSEVDEETFQLA 
KDITQESSLYITGTITEDNRSDLGYEMQVKSIEIVHEAHDYPITPKNHGTEFLMDHRHLW 
LRSKKQHAVMKIRNEIIRATYEFFNENGFTKIDPPILTASAPEGTSELFHTKYFDEDAFL 
SQSGQLYMEAAAMAHGRVFSFGPTFRAEKSKTRRHLIEFWMIEPEMAFTNHAESLEIQEQ 
YVSHIVQSVLNHCQLELKALDRDTTKLEKVATPFPRISYDDAIEFLKKEGFDDIEWGEDF 

15 GAPHETAIANHYDLPVFITNYPTKIKPFYMQPNPDNEDTVLCADLIAPEGYGEI IGGSER 
INDLELLEQRINEHELDEESYSYYLDLRL 

Sequence 619 

ContigJ)4 97_pos_6106_5558, 
20 is similar to (with p-value 4.0e-90) 

>sp:sp|P51183|PTl_STAAU PHOSPHOENOLPYRUVATE- PROTEIN PHOSPHOT 
RANSFERASE (EC 2.7.3.9) (PHOSPHOTRANSFERASE SYSTEM, ENZYME I 
). >gp:gp|X93205|SAPTSHI_2 S. aureus ptsH and ptsl genes. NID 
: gl070384. 

25 atgttcccaatggtagcaacaattaaagaattccgtgacgctaaatcaatgcttcttgaa 
gagaaagaaaatcttcttcgcgaaggttacgaagtttcagatgatattgaattaggtatt 
atggttgaaattccagctaccgcggcacttgctgatgtatttgctaaagaagtagatttc 
tttagtataggaacgaatgacttaattcaatacacattagctgctgaccgtatgtctgaa 
cgagtttcatacttatatcaaccatataatccttcaattttacgattagttaaacaagtt 

30 attgaagcttctcataaagaaggtaaatggactggtatgtgtggtgaaatggctggagat 
caaacagctgtgcctttattattaggtttaggtttagatgagttctcaatgagtgcgact 
tctatcctaaaagctagaagacaaatcaatggtttaagtaaaaatgaaatggctgaactc 
gctaatagagctgttgaatgctcaacgcaagaggaagtcgttgatttagttaaccaatta 
gctaaataa 

35 

Sequence 620 

MFPMVATIKEFRDAKSMLLEEKENLLREGYEVSDDIELGIMVEIPATAALADVFAKEVDF 
FSIGTNDLIQYTLAADRMSERVSYLYQPYNPSILRLVKQVIEASHKEGKWTGMCGEMAGD 
QTAVPLLLGLGLDEFSMSATSILKARRQINGLSKNEMAELANRAVECSTQEEVVDLVNQL 
40 AK+ 

Sequence 621 

Contig_04 97_pos_234 7_1763, 

is similar to (with p-value 3.0e-69) 

45 >sp:sp|P397 60|YKQB_BACSU HYPOTHETICAL 24.3 KD PROTEIN IN KIN 
C-ADEC INTERGENIC REGION (ORF4) . >gp : gp I AF012285 I AF012285_27 
Bacillus subtilis mobA-nprE gene region. NID: g3282109. >gp 
:gp|D37799|BACAMOKOOO_6 Bacillus subtilis genes for ampS, mr 
eBH, orfl, kinC, orf3, orf4 and orf5. NID: g520838. >gp:gp|Z 

50 99111 |BSCJB0008_123 Bacillus subtilis complete genome (sectio 
n 8 of 21): from 1394791 to 1603020. NID: g2633699. 
atggatgttatggctattgatagagacgaaaatcgtgttaacgaatatagtgatatagca 
acacatgcagttgttgctgatacaactgatgaggcagtaatgaagagtttaggaatacgt 
aatttcgatcatgttattgtcgctattggtgagaatatacaatctagtacactaacgacg 

55 ttaattcttaaagaattaggtgttaaaaaggttactgctaaagcccaaaatgattatcat 
gctaagattttaaataaaataggtgccgatactgttgtgcaccctgaaagagatatggga 
agacgtattgctcataatgttgctagtgctagtgtccttgactacttggaacttgctgat 
gagcattcaatcgttgaattaaaatctacagaaaaaatggcaggacaaacaattattgaa 
ttagatattcgagctcaatatggtattaacattatagcaattaaaagagctaaagaattt 
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atagtctctccagaccctaacatcaatattgaaataggggacattttaattatgattggt 
catgataatgacttaggtcgctttgaaaaaaatataagcaagtaa 

Sequence 622 

5 MDVMAIDRDENRVNEYSDIATHAVVADTTDEAVMKSLGIRNFDHVIVAIGENIQSSTLTT 
LILKELGVKKVTAKAQNDYHAKILNKIGADTVVHPERDMGRRIAHNVASASVLDYLELAD 
EHSIVELKSTEKMAGQTIIELDIRAQYGINI IAIKRAKEFIVSPDPNINIEIGDILIMIG 
HDNDLGRFEKNISK* 

10 Sequence 623 

Contig_04 98_pos_20l7_3027, 

is similar to {with p-value 8.0e-80) 

>sp:splP41006|PYRP_BACCL URACIL PERMEASE (URACIL TRANSPORTER 
). >pir:pir|S38893|S38893 uracil transport protein - Bacillu 

15 s caldolyticus >gp: gp j X76083 | BCPYRQP_2 B . caldolyticus (DSM40 
5) pyrR, pyrP and pyrB (partial) genes. NID: g431229. 
atgctggttgcat tat t tatgagtggattaatgtacgtgat tataggtat tt teat taaa 
ttgagtggaacacattggttaatgcacttgttaccaccagtagttgtcggaccagtaata 
atggtcattgggttaagtttagctcctacagcagtaaacatggccatgttcgaaaattct 

20 gctgaaatgaaagggtataacttaagttacttaattgttgctttgattacattagcagta 
accatcatcgtccaaggattcttcaaaggatttttatcactaatacctgtacttataggt 
attatagtgggatatattgtatccattttcatgggcatagttaaatttgctccaatagca 
caagcgaaatggatagattttcctcatatttatctaccatttaaagattacacaccatct 
tttcatttaggactcattctcgtgatgatacccgtggtgtttgtgacggtaagtgaacat 

25 attggtcatcaaatggtaattaataaaatagtaggacgcaatttctttgaaaatccaggt 
ttagataaatcaatcattggtgatggtgtttcaactatgtttgcaagtatgataggaggt 
cctcctagtacaacttatggtgaaaatataggtgtactagcgatcaccaaaatatatagt 
atttacgttattggtggtgcggcagttatagctatcattcttgcatttattggtaagttc 
actgctttaatatcttcaatacctacgccagtgatgggtggtgtctcaattttattattc 

30 ggtattatagcagctagtggtttaagaatgcttgttgaaagtcaagtagatttcgcaagc 
aatcgcaacttggttatagcatcagttgtgcttgttgtcgggattggtaatcttcttatc 
aatttaaaaggcataggtatcaatttacaaattgaaggaatggcattatcagcactttca 
ggaataatattaaatttaattttgccaaaagataaaaaccaaataaattaa 

35 Sequence 624 

MLVALFMSGLMYVIIGIFIKLSGTHWLMHLLPPWVGPVIMVIGLSLAPTAVNMAMFENS 
AEMKGYNLSYLIVALITLAVTIIVQGFFKGFLSLIPVLIGI IVGYIVSIFMGIVKFAPIA 
QAKWIDFPHI YLPFKDYTPSFHLGLILVMIPVVFVTVSEHIGHQMVINKIVGRNFFENPG 
LDKSI IGDGVSTMFASMIGGPPSTTYGENIGVLAITKI YSI YVIGGAAVIAI ILAFIGKF 

40 TALI S S I PT PVMGG VS I LLFG 1 1 AASGLRMLVESQVDFASNRNLVI ASVVLWG I GNLL I 
NLKG I GI NLQI EGMALSALSG I ILNLILPKDKNQI N * 

Sequence 625 

Contig_04 98_pos_3053_3934, 

45 is similar to (with p-value 2.0e-90) 

>sp:sp| P05654 |PYRB_BACSU ASPARTATE CARBAMOYLTRANSFERASE (EC 
2.1.3.2) (ASPARTATE TRANSCARBAMYLASE) (ATCASE) . >pir:pir|A25 
015IOWBSAC aspartate carbamoyltransf erase (EC 2.1.3.2) catal 
ytic chain - Bacillus subtilis >gp : gp I Ml 3 1 2 8 | BACPYRB_1 B.sub 

50 tilis pyrB gene encoding aspartate transcarbamoylase, comple 
te cds. NID: gl43383. >gp: gp { M59757 j BACPYROP_3 Bacillus subt 
ilis pyrimidine biosynthetic (pyr) gene cluster (pyrR, pyrP, 
pyrB, pyrC, pyrAA, pyrAB, pyrD, pyrF and pyrE) genes, compl 
ete cds. NID: g387576. >gp:gp| Z99112 I BSUB0009_20 Bacillus su 

55 btilis complete genome (section 9 of 21) : from 1598421 to 18 
07200. NID: g2633902. 

atggaacacttattatcaatggagcatttatctaattcagaaatttatgatttaattact 
atcgcttgccaattcaaatctggtgagcgaccattacctcaatttaacggtcaatacgta 
tcaaacttattcttcgaaaatt caacgcgaacaaagtgtagctttgagatggcagaacaa 
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aaattaggattaaaacttattaattttgaaacaagtacatcatctgtaaaaaagggtgag 
tcactttatgacacatgtaaaacacttgaaagtataggtgttgatttacttgtcatacgt 
cactcccaaaattcttattacgaagaactggatcaattaaatattccaattgctaatgca 
ggtgatggaagtggacaacatcctactcagagtttattagacataatgacaatatatgaa 
5 gaatatggttcgtttgaaggtttgaatattctaatatgtggggacattaaaaattctcgt 
gtcgcaagaagtaattatcatagtttaacatcattaggtgccaacgtaatgttctcaagt 
ccaaaagaatgggtagataatacattagaggcgccttatgttgaaattgatgaagtcatt 
gataaagtagatattgttatgttgcttagagttcaacatgaaagacatggaatttcaggt 
gaagctaactttgctgctgaagaatatcatcaacaatttggtttaacacaggctagatat 
10 gataaattaaaagaggaagccattgtaatgcatccagqtcctgtaaatagaggtgttgaa 
attaaaagcgagctagttgaagcacctaagtctcgaatatttaagcagatggaaaatgga 
atgtatttaagaatggcagtaataagtgcgcttttacaatag 

Sequence 626 

15 MEHLLSMEHLSNSEI YDLITIACQFKSGERPLPQFNGQYVSNLFFENSTRTKCSFEMAEQ 
KLGLKLINFETSTSSVKKGESLYDTCKTLESIGVDLLVIRHSQNSYYEELDQLNIPIANA 
GDGSGQHPTQSLLDIMTI YEEYGSFEGLNILICGDIKNSRVARSNYHSLTSLGANVMFSS 
PKEWVDNTLEAPYVEI DEVI DKVDI VMLLRVQHERHGISGEANFAAEEYHQQFGLTQARY 
DKLKEEAIVMHPAPVNRGVEIKSELVEAPKSRIFKQMENGMYLRMAVIS7VLLQ* 

20 

Sequence 627 

Contig_04 98_pos_3952_0, 

is similar to (with p-value 0.0e+00) 

>sp:sp|P4 6538| PYRC_BACCL DIHYDROOROTASE (EC 3.5.2.3} (DHOASE 
25 ). >pir:pir | S34 319 | S34 319 dihydroorotase (EC 3.5.2.3) - Baci 
llus caldolyticus >gp: gp | X73308 I BCPYR_2 B . caldoly ticus pyrim 
idine biosynthesis genes. NID: g312439. 

atgaaattaattaaaaacggaaaaatcttaaaaaacggtatcctaaaagacacagaaatt 
ttaatcgacggtaaacgtattaaacaaattagtagtaaaattaatgcttcatcttcaaat 

30 attgaagttattgatgcaaaaggaaatttaattgctcccggttttgtagatgttcatgtg 
cacctacgtgaaccaggtggtgaacataaagaaacaattgaaagtggtacaaaagccgct 
gcaagaggtggttttactacagtatgtcctatgcctaatacaagacctgtaccagataca 
gttgaacatgttagagaattaagacaacgaatttctgaaacagcacaagttagggtgttg 
ccttatgctgctattactaagagacaagcaggtactgaacttgttgattttgaaaaatta 

35 gcactagaaggtgtgtttgcatttactgacgatggtgtgggagttcaaacagcaagtatg 
atgtatgctgctatgaagcaagctgcaaaagttaaaaaaccgattgtcgcacactgtgaa 
gataatagcttaatctatggtggtgcaatgcataaaggtaaacgtagtgaagaattaggc 
atacctggtattccaaatattgctgaatctgtacaaattgctagagatgtattattggct 
gaagcaactggttgtcactatcatgtgtgtcatgtttcaactaaggaaagtgttcgagta 

40 atcagagacgctaaaaaagctggtatccatgtaacagcagaagttacaccacatcattta 
ttattaactgaaaatgatgt tcctggcgatgattcaaactacaaaatgaatccaccatta 
agaagtaatgaagatagagaagcacttttagaaggcttattagatggaacaattgattgt 
attgcaacggatcatgcacctcacgctaaagaagaaaaagcacaacctatgacaaaagca 
cctttcggcatcgtaggtagtgaaacagcattcccattactttatacacactttgtaaga 

45 cgaggtaattggtcactgcaacaattagttgattatttcactattaaaccagctactatt 
ttcaacttaaattatggaaaattacacaaagat 

Sequence 628 

MKLIKNGKILKNGILKDTEILIDGKRIKQISSKINASSSNIEVIDAKGNLIAPGFVDVHV 
50 HLREPGGEHKETIESGTKAAARGG FTTVC PMPNTRPVPDTVEHVRELRQRISETAQVRVL 
PYAAITKRQAGTELVDFEKLALEGVFAFTDDGVGVQTASMMYAAMKQAAKVKKPIVAHCE 
DNSLIYGGAMHKGKRSEELGIPGIPNIAESVQIARDVLLAEATGCHYHVCHVSTKESVRV 
IRDAKKAGIHVTAEVTPHHLLLTENDVPGDDSNYKMNPPLRSNEDREALLEGLLDGTIDC 
IATDHAPHAKEEKAQPMTKAPFGIVGSETAFPLLYTHFVRRGNWSLQQLVDYFTIKPATI 
55 FNLNYGKLHKD 

Sequence 629 

Con t ig_0 4 9 9_pos_4 5 7 5_5 165, 

is similar to (with p-value 3.0e-47) 
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>sp:sp| P42954 |TAGH_BACSU TEICHOIC ACID TRANSLOCATION ATP-BIN 
DING PROTEIN TAGH . >gp:gp| U13832 | BSU13832_2 Bacillus subtili 
s 168 highly hydrophobic integral membrane protein (tagG) ge 
ne and ATP-binding protein (tagH) gene, complete cds . NID: g 
5 755151. >gp:gpl Z99122 |BSUB0019_67 Bacillus subtilis complete 
genome (section 19 of 21): from 3597091 to 3809700. NID: g2 
636029. 

atgattggtggctctatttcaccaagttccggtgaaataacgagacatggtgatgtgagt 
gtcatcgctattaatgcaggactaaatggacaattgacaggtgtagaaaatattgaattt 

10 aaaatgctctgcatgggctttaaaaggaaagaaattaaaaaattaatgccggaaattata 
gaatttagtgaactcggcgaatttatttatcaacctgttaaaaaatattcaagtggtatg 
cgtgcaaaacttggattttcaattaatattactgttaatcctgacatattagttattgac 
gaagcattatcagtaggcgatcaaacatttactcaaaaatgtttagataaaatttatgaa 
tttaaagcggctaaaaaaacaatattttttgttagtcataatattagacaagtgcgtgaa 

15 ttttgtacaaaaatcgcttggattgagggcggtaaactaaaagaattcggcgaacttgaa 
gaagtattacctgattatgaggcgtttcttaaaacttttaagaaaaaatctaaagcagaa 
caaaaggaatttagaaataaattagatgagtcacgttttgtcgtaaaataa 

Sequence 630 

20 MIGGSISPSSGEITRHGDVSVIAINAGLNGQLTGVENIEFKMLCMGFKRKEIKKLMPEII 
EFSELGEFI YQPVKKYSSGMRAKLGFS I NITVNPDI LVI DEALSVGDQT FTQKCLDKI YE 
FKAAKKTIFFVSHNIRQVREFCTKIAWIEGGKLKE FGELEEVLPDYEAFLKTFKKKSKAE 
QKEFRNKLDESRFWK* 

25 Sequence 631 

Cont ig_0 4 9 9jpos_61 7 6_6 92 2 , 

putative peptide of unknown function 

atgatttatttacaaaattttattactacaactatccaattaaatatctatcttatttta 
gttattggactgctttacgtaatcatccactattatagaaataaaggtgttaacgctttc 

30 ttagatatttatttaaattatataccggtacttacacacgaatttggccacgtcttattt 
aacaaactcgctggtggaaaggccaaagatcttgtcattgtgacaagccctagagaaaga 
aaagtcacttcacaacaaggctatgcgattacacaatctaaaggatacttaggtcagttt 
attacaactataggtgggtatcttatgccaccattgatgtttttaactggattggtatct 
attcactatcaatatccaagtatatttattactatatatttatttatttttatatattat 

35 ttctttattacttcccgtaaactatcacctttgattgtcattatactcatctcaagttta 
ctctatttagtatttaaacaagaccatcaatggttcatttacgacattgtcacattaagt 
taccattttattttaggcgtacttttaggtgaaatattacaatcctcatggacgattttt 
cgt cttacctttcaacgacctaaaccttcttgggatggcagt get ttaacgaaagt tact 
cgagtacccacctttatctttagtttagtgtggatattattcaatctctatactgtgtat 

40 ttattaatcaaatacacaatactataa 

Sequence 632 

MIYLQNFITTTIQLNI YLILVIGLLYVIIHYYRNKGVNAFLDI YLNYIPVLTHEFGHVLF 
NKLAGGKAKDLVIVTS PRERKVTSQQGYAITQSKGYLGQFI TTIGGYLMPPLMFLTGLVS 
45 IHYQYPSIFITIYLFIFI YYFFITSRKLSPLIVIILISSLLYLVFKQDHQWFIYDI VTLS 
YHFILGVLLGEILQSSWTIFRLTFQRPKPSWDGSALTKVTRVPTFIFSLVWILFNLYTVY 
LLIKYTIL* 

Sequence 633 
50 Contig_04 99_pos_12156_11095, 

putative peptide of unknown function 

gtgataatcgacagattgcagagttatgttaccttatttggagagagtccattccaaaaa 
atgatttctaaaaatgatgaagataaagttactgagagtaaacctaagcgtagcttatat 
gcactaatcatgactctatgtggtgtacatggaaccatttcactcgcaatcgccttaacc 
55 ttgccatatttattagcaaatcatgaaacatttgcttatcgaaatgatttattatttatt 
gcttccggaatggtaatattaagtttaattattgcacaagtcattctgcctttagtaacg 
cctgatagccctgaagtgaagataggtaatatgtcatttaaagaggcgagaatctacatt 
ttagaacatgttatcgattacctaaatcaaaaatcgacgtttgaaacgagttaccgttat 
ggaaacgtcattaaagattaccacgataaactcacatttttaaaaacggttgaaaaggaa 
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gatgaaaactccaaagaactagaacgacttcaaaagattgcatttaacgtagaaacaaaa 
acgctagagaaattggttgatgatggcgaaattactgagagtgttcttgaaaactatatg 
cgttacgctgaacgaacagaagtgtataaacaagcttcgttattaagacgaattattgtt 
ggtttaagaggaatgctattgaaacgtcgtgtaaaaacaaaaattaattcggcatcatct 
5 cttagtgttactgataacttattagaattgggtaaaatcaataagcttgttcattataac 
gtcgtaagtcgtttagccaaagaagctactactgataataaactagaagtaggtatgatt 
tgcgatggatatctgatgagaatagataacttaacaccaaacaatttctttaattccaga 
catgaagatacacttaccaaaattaaattaaatgctttaagagaacaacgtcgtattcta 
agagaactaattgaaaatgatgagataacagaaggtactgcattaaaattaagagaatcc 
10 attaattatgatgaaatggtaattgtagatagtatgacataa 

Sequence 634 

VIIDRLQSYVTLFGESPFQKMISKNDEDKVTESKPKRSLYALIMTLCGVHGTISLAIALT 
LPYLLANHETFAYRNDLLFIASGMVILSLIIAQVILPLVTPDSPEVKIGNMSFKEARIYI 
15 LEHVIDYLNQKSTFETSYRYGNVIKDYHDKLTFLKTVEKEDENSKELERLQKIAFNVETK 
TLEKLVDDGEITESVLENYMRYAERTEVYKQASLLRRIIVGLRGMLLKRRVKTKINSASS 
LSVTDNLLELGKINKLVHYNVVSRLAKEATTDNKLEVGMICDGYLMRIDNLTPNNFFNSR 
HEDTLTKIKLNALREQRRI LRELI EN DEI TEGTALKLRES I N YDEMVI VDSMT* 

Sequence 635 

Contig_04 99_pos_5935_524 3, 
is similar to (with p-value 2.0e-31) 

>sp:sp|P27620|TAGA_BACSU TEICHOIC ACID BIOSYNTHESIS PROTEIN 
A. >pir :pir I B49757 | B49757 techoic acid synthesis protein tag 
A - Bacillus subtilis (strain 168} >gp: gp | M574 97 I BACTAGABCD_ 
2 B. subtilis tagA, tagB, tagC and tagD genes, complete cds . 
NID: gl43722. >gp: gp I Z99122 I BSUB0019_72 Bacillus subtilis co 
mplete genome (section 19 of 21): from 3597091 to 3809700. N 
ID: g2636029. 

atgctggaaatggtagaaaatattaaacaattcatatctagcaatacagatgataattta 
tttatagtgactgctaat cctgaaatcgtggattatgcaactgaacatgagctatataga 
aatttaattaatcaagctgattatgtagttccagatggtacaggaatagtaaaagcttca 
aagcgattaaaacagcccttaaaacggcgtgtgccaggaatagaacttcttgaagaatgt 
ctgaaaatagcacatgtcagccatcagcgcgtatatctgcttggatctaaaaatgaaatt 
gt tgagtcagcagagaaaaaacttcaatctcaataccctaa tat ccactttgcaca teat 
catggctatattcatctagaagatgaaacagtcataaaacgtataacaagttttaatccc 
gattacatttttgtaggaatgggatttccaaagcaagaacaatggattcaaaagcataag 
gacaagtttaagcacactgtgatgatgggcgtaggtgggtcgtttgaagtattcagtggc 
tcaaagaaaagagcacctcaaatatttagaaagttaaatattgagtgggtatatcgtgtg 
cttattgattggaaacgcattgggagaatgataagtattcctaaatttatgttaaaggta 
gcaatacaaaaatataaaatgaaatcaaaataa 

Sequence 636 

MLEMVENIKQFISSNTDDNLFIVTANPEIVDYATEHELYRNLINQADYVVPDGTGI VKAS 
45 KRLKQPLKRRVPGIELLEECLKIAHVSHQRVYLLGSKNEIVESAEKKLQSQYPNIHFAHH 
HGYIHLEDETVIKRITSFNPDYI FVGMGFPKQEQWIQKHKDKFKHTVMMGVGGSFEVFSG 
S KKRA PQ I FRKLN I E WV Y R VL IDWKRIGRMISIPKFM LKVA I QK Y KM KS K * 

Sequence 637 
50 Con t ig_0 4 9 9_pos_3 8 0 2_3 227, 

is similar to (with p-value 3.0e-29) 

>sp:sp|P4 2953|TAGG_BACSU TEICHOIC ACID TRANSLOCATION PERMEAS 
E PROTEIN TAGG. >gp : gp | U13832 | BSU13832_1 Bacillus subtilis 1 
68 highly hydrophobic integral membrane protein (tagG) gene 
55 and ATP-binding protein (tagH) gene, complete cds. NID: g755 
151. >gp:gpl Z99122 | BSUB0019_68 Bacillus subtilis complete ge 
nome (section 19 of 21): from 3597091 to 3809700. NID: g2636 
029. 

atgtggttctttattaatcaaggtgtcctagaaggaactaaatcaatctcacagaaattc 
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aatcaagtggcaaagatgaatttcccactctcaatcattcctacttatattgtaacaaqt 
aggttctatggtcatttaggattattagcaattattataatagcttgtatgttcaatgga 
attatcccttcaattcacattgtacaattacttatatatgtaccttttgcatatttgcta 
acatcgtcggtggcacttttaacatccactttggggattttaattagagatacgcagatg 
5 attatgcaagcattaatgagaatattgttttatatgtctccaattttatgggtgccaaaa 
aat cacggcgtaagtggtttgat teat caaattatgttatttaatccagta tat tt tat c 
gcagaatcataccgagcagcgatattgttccatcaatggtatttcatagatcattggaag 
ttaatgctatataacgttattatcattctcttattctttatagtaggttctattttacat 
agacgctatagagatcactttgcggacttcttgtaa 

10 

Sequence 638 

MWFFINQGVLEGTKSISQKFNQVAKMNFPLSIIPTYIVTSRFYGHLGLLAIIIIACMFNG 
IIPSIHIVQLLIYVPFAYLLTSSVALLTSTLGILIRDTQMIMQALMRILFYMSPILWVPK 
NHGVSGLIHQIMLFNPVYFIAESYRAAILFHQWYFIDHWKLMLYNVI IILLFFIVGSILH 
15 RRYRDHFADFL* 

Sequence 639 

Cont ig_0 4 9 9_pos_2 8 6S_1 891, 

is similar to (with p-value 4.0e-36) 

20 >sp:sp| P27 621|TAGB_BACSU TEICHOIC ACID BIOSYNTHESIS PROTEIN 
B PRECURSOR. >pir : pir | C4 9757 | C4 9757 techoic acid synthesis p 
rotein tagB - Bacillus subtilis (strain 168) >gp: gp | M574 97 | B 
ACTAGABCD_3 B. subtilis tagA, tagB, tagC and tagD genes, comp 
lete cds. NID: gl43722. >gp: gp | Z99122 I BSUB001973 Bacillus s 

25 ubtilis complete genome (section 19 of 21) : from 3597091 to 
3809700. NID: g2636029. 

gtgcttttaaatatggtttttaaaccgtttaatataaattcgaagcacattgtgataatg 
atgacctttaagcaagatatactgcctattatagaggccttatgtagtgaaggataccat 
gtgacggttataggtaaaaaaatatatcagaaagatattaataacattaatcatgcatat 

30 tttatacctgccggaaataaatatattatgagacatatgaaagtattaagtaaagcaaag 
gttattattttagatacgtattatttaatgatgggtggctatcagaagaaaaaagggcaa 
act gt tat tcaaacatggcatgctgctggtgcgcttaaaaattttggcttaactga teat 
caagttgatttaaaaaataaggctatggtaagacaatacaaaaaagt ttatgat get acc 
gattattatttggtaggtggggagaaaatggctcaatgttttatacaatcgtttgatgca 

35 tctccatcgcaaatgttaaagtttggacttccaagactgacccaatactttagaagcaat 
cttaagttagaacaacaacgattaaaaaagaaatatcatattacaaataaactcgcagta 
tatgttccgacttatagagaaggtcaagtagcacaacgtactattgataaagaaaacttt 
gaacggcacttgccgaattatacgttattgagtcatttgcatccttcgactgttgattgt 
caaacttctcattcaatcgatgttacttcattgttaattatggcggatattattataagt 

40 gattatagctcattacctattgaagcaagcgcacttaataaaccgacacttatttataat 
tatgatgaacagcaatatgaaaaagtaagaggattgaatgaattttattatgctattcca 
gaacgatacaaaatgagtaatgaagagtcaattatacaagcgatacaggataacgatgag 
caatttcaatcttag 

45 Sequence 640 

VLLNMVFKPFNINSKHIVIMMTFKQDILPIIEALCSEGYHVTVIGKKIYQKDINNINHAY 
FIPAGNKYIMRHMKVLSKAECVIILDTYYLMMGGYQKKKGQTVIQTWHAAGALKNFGLTDH 
QVDLKNKAMVRQYKKVYDATDYYLVGGEKiyiAQCFIQSFDASPSQMLKFGLPRLTQYFRSN 
LKLEQQRLKKKYHITNKLAVYVPTYREGQVAQRTIDKENFERHLPNYTLLSHLHPSTVDC 

50 QTSHSIDVTSLLIMADIIISDYSSLPIEASALNKPTLIYNYDEQQYEECVRGLNEFYYAIP 
ERYKMSNEESIIQAIQDNDEQFQS* 

Sequence 641 

Contig_04 99j?os_1878_124 0, 
55 is similar to (with p-value 4.0e-66) 

>gp:gp| AF008219 I AF008219_3 Borrelia afzelii R-IP3 chromosome 
right end, arcA and arcB genes, complete cds. NID: g2697111 

atgaaaaatttacgtaacagaagctttttaactttattagacttttcacgacaagaggta 
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gaatttttattaacactctccgaagatttgaagcgtgccaaatatatcggcactgaaaag 
cctatgctaaaaaataaaaatatcgcgcttctttttgaaaaagattccactagaacacgt 
tgcgcattcgaagttgccgcacatgatcaaggtgcacacgtcacttatcttggacctaca 
ggttctcaaatgggtaaaaaagaaactgctaaagatacagcacgtgtacttggtggtatg 
5 tatgatggtattgagtaccgaggtttctctcaacgtactgtagaaacattagcgcaatat 
tcaggtgttccggtatggaatggattaaccgatgaagatcaccctacacaagtgcttgct 
gactttttaactgctaaagaagtattgaaaaaagagtatgctgatatcaactttacttat 
gttggcgatggacgtaacaatgttgctaacgcattaatgcaaggtgctgccattatgggt 
atgaatttccatcttgtttgtcctaaagaactcaatccgacagaagaattattaaatcgt 
10 tgcgacgtattgcgacggaaaatggcggtaacattttaa 

Sequence 642 

MKNLRNRSFLTLLDFSRQEVEFLLTLSEDLKRAKYIGTEKPMLKNKNIALLFEKDSTRTR 
CAFEVAAHDQGAHVTYLGPTGSQMGKKETAKDTARVLGGMYDGIEYRGFSQRTVETLAQY 
15 SGVPVWNGLTDEDHPTQVLADFLTAKEVLKKEYADINFTYVG DGRNNVANALMQGAAIMG 
MNFHLVCPKELNPTEELLNRCDVLRRKMAVTF* 

Sequence 643 

Contig_0500_pos_5053_3860, 

20 is similar to (with p-value 0.0e+00) 

>sp:sp|Q07 908|ARGJ_BACST GLUTAMATE N-ACETYLTRANSFERASE (EC 2 
.3.1.35) (ORNITHINE ACETYLTRANS FERASE ) (ORNITHINE TRANSACETY 
LASE) (OATASE) / AMINO-ACID ACETYLTRANS FERASE (EC 2.3.1.1) ( 
N-ACETYLGLUTAMATE SYNTHASE) (AGS) . >gp: gp | L06036 j BACACETYL2 

25 Bacillus stearothermophilus ornithine acetyltransf erase (ar 
gj) and acetylglutamate kinase (argB) genes, complete cds's, 

argC gene, 3' end, and argD gene, 5' end. NID: g304133. 
atgaatataattaagggaaatattgcaagtcctcttggattttcagctgatggtctgcac 
gctggctttaaaaagaaaaaattagactttggttggattgtttcagaagtacctgcaaat 

30 gtagctggtgtatttacaactaataaggtcattgctgcaccattaaaattaacaaaaaac 
agcatcgaaaaaagtggtaaaatgcaagctattgttgttaattcaggtattgctaattct 
tgtact ggtaaacaaggagaaaaagatgcttttaaaatgcaacaactggccgcaaataaa 
ttacaaattcaaccagaatatgttggtgtcgcatctactggtgttattggaaaggtgatg 
ccaatgtctattctaaagaatggcttttccaaactagttaaaaacggtaatgctgatgac 

35 tttgcaaaagcgatattaacaacggatactcatacaaaaacatgcgttgtaaacgaagaa 
tttggtagcgatacagtaacgatggcaggtgtagcaaaagggtcaggaatgatacatcct 
aatttggctacaatgctagcatttataacctgtgacgctaacatctcatcacaaacatta 
caacaggctttaaaagatgtggttgaagttacattcaatcaaatcactgtagatggtgac 
acttcaacaaatgatatggtgcttgtgatgtcaaatggatgtacaaataataacgaaatt 

40 aaaaaagacagcgaagactactataaatttaagcagatgcttctatatattatgaccgat 
ttagcaaaaagtattgcaagggatggcgaaggtgcttctaaattaatagaagtcacggtt 
aaaggtgcaaaagaatctagtgctgcaagaatgattgctaaaagtgtggtgggttcaagt 
ttagtaaaaaccgcaatttttggcgaagatcctaattggggtagaattattgctgctgca 
ggttatgctaaaacatat tttgatattaatcaggtagacatttttataggtaggatacct 

45 gtattaataagatcctcaccagtaaagtacgataaagaagaaattcaagaaataatgagt 
gctgaagaaatatcaattcagcttgaccttcatcaagggaattgtgaaggtcaagcatgg 
ggatgtgatttatcgtatgactacgttaaaatcaacgcactatacaccacttag 

Sequence 644 

50 MNIIKGNIASPLGFSADGLHAGFKKKKLDFGWIVSEVPANVAGVFTTNKVIAAPLKLTKN 
SIEKSGKMQAIVVNSGIANSCTGKQGEKDAFKMQQLAANKLQIQPEYVGVASTGVIGKVM 
PMSILKNGFSEaVKNGNADDFAKAILTTDTHTKTCVVNEEFGSDTVTMAGVAKGSGMIHP 
NLATMLAFITCDANISSQTLQQALKDVVEVTFNQITVDGDTSTNDMVLVMSNGCTNNNEI 
KKDSEDYYKFKQMLLYIMTDLAKSIARDGEGASKLIEVTVKGAKESSAARMIAKSVVGSS 

55 LVKTAI FGEDPNWGRI IAAAGYAKTYFDINQVDI FIGRI PVLIRSS PVKYDKEEIQEIMS 
AEEISIQLDLHQGNCEGQAWGCDLSYDYVKINALYTT* 

Sequence 645 

Contig_0500_pos_1725_1258, 



161 



WO 01/34809 



PCT/US00/30782 



is similar to (with p-value 5.0e-27) 

>sp:sp|P4 978 6|BCCP_BACSU BIOTIN CARBOXYL CARRIER PROTEIN OF 
ACETYL-COA CARBOXYLASE (EC 6.4.1.2) (BCCP) . 

atgaactttaaagaaataaaagaattaatcgaaattcttgatcaatctagtttaactgaa 
5 ataaatattgaagataataaaggtagcgtagttaatttaaaaaaagaaaaagagactgaa 
atagttacaccgcaagttactcaacaaccaactcaaccgataaatcatacgcataatgaa 
acacaacaaaagccatcacatagctctaaagatgaacaaagtagtgataatgaatacaat 
accattaatgcaccaatggttggtacattttataaatcaccttcaccagatgaagaagca 
tacgttcaagttggagataaagttacgaatgaaagtactgtttgtatattagaagctatg 
10 aaattatttaatgagattcaagccgaaacaacaggtgaaatcatagaaattttagtagaa 
gacggacaaatggtagagtatggccagccgttatttaaggtgaaataa 

Sequence 64 6 

MNFKEIKELIEILDQSSLTEINIEDNKGSVVNLKKEKETEIVTPQVTCXJPTQPINHTHNE 
15 TQQKPSHSSKDEQSSDNEYNTINAPMVGTFYKSPSPDEEAYVQVGDKVTNESTVCILEAM 
KLFNEI QAETTGEI I E I LVEDGQMVEYGQPLFKVK* 

Sequence 647 
Contig_0500_pos_0_925, 

20 is similar to (with p-value 6.0e-95) 

>sp:sp|P4 9787|ACCC_BACSU BIOTIN CARBOXYLASE (EC 6.3.4.14) (A 
SUBUNIT OF ACETYL-COA CARBOXYLASE (EC 6.4.1.2)) (ACC) . >gp: 
gp|U36245 |BSU36245_2 Bacillus subtilis biotin carboxyl carri 
er protein (accB) and biotin carboxylase (accC) genes, compl 

25 ete cds . NID: gl055244. 

atgggaataaaagatattgctaaagctgaaatgattaaagccaatgtacctgtagtacca 
ggaagtgaaggacttattcaaagtatagatgacgctaaaaaaatagctaaaaaaatcggc 
tatccagttatcatcaaagccacagcaggtggtggtggaaaaggtattcgggttgctcgt 
gatgagaaagaacttgaaactggttaccgtatgacacaacaagaagctgaaaccgcgttc 

30 ggaaatggtggtttatacttagaaaaatttatagaaaactttagacatatagagattcaa 
attattggcgatacttatggaaacgttatacatttaggtgaacgtgattgtacaattcaa 
agaagaatgcaaaagctcgttgaagaagcaccctcaccagttttaagtgaagataaacgc 
caagaaatgggtaatgctgcaattagagccgcaaaagctgtaaattatgaaaacgcaggt 
acaattgaatttatatatgatttagatgataaccaattttatttcatggaaatgaataca 

35 cgtattcaagttgaacacccagtaactgaaatggtaacaggagtagatttagtaaaatta 
caactcaaagttgctatgggtgaggcgttaccttttaaacaagaagatatttccattaac 
ggtcacgctattgaatttcgaatcaatgctgaaaatccttacaaaaactttatgccatca 
ccaggcaagattacccaatatcttgctccaggcggttttggagtgagaattgaatcagca 
tgttatactaattatacgataccaccttactatgactccatggtggcaaaacttatagtt 

40 cacgaacctacacgtgaagaatcaattatgacaggcattcgtgctttaagtgaatatctt 
gttttaggtatcgacactaTGATTT 

Sequence 64 8 

MGI KDI AKAEMI KANVPWPGS EGLI QS I DDAKKI AKKIG Y PVI I KATAGGGGKGI RVAR 
45 DEKELETGYRMTQQEAETAFGNGGLYLEKFIENFRHIEIQI IGDTYGNVIHLGERDCTIQ 
RRMQKLVEEAPS P VLS E DKRQEMGNAAI RAAKAVN YENAGT I EFI Y DLDDNQFY FMEMNT 
RIQVEHPVTEMVTGVDLVECLQLKVAMGEALPFKQEDISINGHAIEFRINAENPYKNFMPS 
PGKITQYLAPGGFGVRIESACYTNYTIPPYYDSMVAKLIVHEPTREESIMTGIRALSEYL 
VLGIDTMIX 

50 

Sequence 64 9 

Contig_0501_pos_9189_8275, 

is similar to (with p-value 0.0e+00) 

>gp:gp|U94706|SAU94706_2 Staphylococcus aureus strain ATCC 8 
55 325-4 cell wall/cell division gene cluster, yllB, yllC, yllD 
, pbpA, mraY, murD, divlB, ftsA and ftsZ genes, complete cds 
. NID: g2149889. 

atgttaaacgaaaccattgattatttaaatattaaagaagatggtgtgtatgttgactgt 
acgttgggtggagcaggacatgccctctatttacttaatcaattaaatgataaaggtaga 
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cttattgcgattgatcaagatttaacagccatagaaaatgcgaaagaagttttaaaagaa 
catttgcacaaagtcacttttgttcataacaactttcgagaattaacaaatattttaaat 
gaattagaaattgaaaaagtagatggtatttattatgacttaggtgtttcaagcccgcaa 
ttggatgtgcctgaaagaggctttagttatcacaatgatgcgaaactagatatgcgaatg 
5 gatcaaacacaatcactttctgcgtatgaagtagttaatcaatggtcttatgaagcatta 
gttaggattttctttcgttacggtgaagagaaattttctaaacaaattgcacgcagaatt 
gaagcccatcgagaacaacaacctatagaaacaactttagaactagttgatgtcattaaa 
gaaggcataccagcgaaagcaagacgaaaagggggacatcctgcgaaacgcgtgttccaa 
gctattcgaattgctgtgaatgatgagttatcagcttttgaagattcagttgagcaagcc 
10 attgaatgtgtgaaggtcggaggtagaatttcagttattactttccactctttggaagat 
cgtttgtgtaaacaaattttccaagagtttgagaaaggtccagacgtaccaagaggtctc 
cccgttattcctgaagcatatacacctaagttaaaacgagtaaatcgtaaaccgattacc 
gctactgatgacgatttaaacgaaaacaatcgagcacgtagcgccaagttacgcgtagca 
gaaatattaaaataa 

15 

Sequence 650 

MLNETIDYLNIKEDGVYVDCTLGGAGHALYLLNQLNDKGRLIAIDQDLTAIENAKEVLKE 
HLHKVTFVHNNFRELTNILNELEIEKVDGIYYDLGVSSPQLDVPERGFSYHNDAKLDMRM 
DQTQSLSAYEVVNQWS YEALVRI FFRYGEEKFSKQI ARRI EAHREQQP I ETTLELVDV I K 
20 EGI PAKARRKGGHPAKRVFQAIRIAVNDELSAFEDSVEQAIECVKVGGRISVITFHSLED 
RLCKQIFQEFEKGPDVPRGLPVIPEAYTPKLKRVNRKPITATDDDLNENNRARSAKLRVA 
EILK* 

Sequence 651 
25 Contig_0501_pos_7793_5553, 

is similar to (with p-value 0.0e+00) 

>gp:gp| ABO075O0 |AB007500_3 Staphylococcus aureus genes for p 
enicillin-binding protein 1, MraY, MurD, partial and complet 
e cds. NID: g2463558. >gp : gp j U94 706 | SAU94 706_4 Staphylococcu 
30 s aureus strain ATCC 8325-4 cell wall/cell division gene clu 
ster, yllB, yllC, yllD, pbpA, mraY, murD, divlB, ftsA and ft 
sZ genes, complete cds. NID: g2149889. 

gtgctaaggttttcttatgtaatgattactggccactctaatggtcaagatttaattatg 
aaagccaatgagaaatacttagtcaaaaattctcaacaaccagaacgaggtaagatttac 

35 gatcgtaatggtaaagttttagcagaagatgtagaaagatataaacttgttgcagttgtg 
gataaaaaagcaagtaaagaaagtaaaaagccgcgacacgtggttgataaaaaaaagaca 
gcaaaaaaattagctgaaatcatagatatggacgctgacgaaatagaaaaacgacttaat 
aataagaaagcctttcaaatcgaatttggtcagaaaggtactaatttaacttatcaagaa 
aaagaaaaaatagagaaaatgaaattacctggtatagcactttacccagaaactgagcga 

40 ttttatcctaatggtaattttgcctcccatttaatagggatggcacagaaagatcctgat 
actggtgaattaaatggtgcattaggtgttgaaaaaatatttaatagttatttaaatgga 
tcaagaggtgcacttaaatatatacatgatatatggggctacatcgcacctaatacgaag 
aaagagcagcaacctaaacgtggagatgatgtacacttaacaattgattctaatatacaa 
gtctttgtggaagaagctcttgatgacatggttgaacggtatgctccaaaagatttattt 

45 gcagtagtaatggacgcaaaaactggtgaaatacttgcatatagccaacgtccaactttt 
aatcctgaaacaggtaaagattttggcaaaaagtgggcgaacgatttatatcaaaataca 
tatgaaccgggctctacttttaaaacatacggcttagctgcagcaattcaagaaggtaaa 
ttcaaaccggatgaaaagtataaatcaggtcatagaaatattatgggctctgaaatttcc 
gattggaataaaactggttggggacgtatacctatgtcgttaggttttacttattcatca 

50 aatacgttgatgatgcacttacaagatttggttggtgccgataaaatgaaatcttggtat 
gaacgct ttggatttggcaaaaaaacgggtggtatgtttgatggagaagctgcaggtaat 
attggttgggcaaatgaattacaacaaaaaacgtcagcatttggtcaatccacaactgtt 
acccctgctcaaatgattcaagcacaatcggctttctttaataaaggaaatatgcttaaa 
ccatggtttgtaagtagtattgataatccaataactaaaaagaattattactctggtaaa 

55 aaagagtttgtcggtaaaccagtaacggaagaaacagccaataaagttgaagaagaactt 
gataaagtagtaaatagtaagaagagtcatgctatgaattatcgcgtaaaaggttatgat 
. attgaaggtaagacaggaacagcacaagtagctgattcaaatggaggcggttatgttaaa 
ggtgaaaatccttactttgtaagcttcatgggggatgcacctaagaaaaatcctaaagtc 
attgtctatgcaggtatgagtcttgctcaaaaaaatgatcaagaagcatatgaaatgggt 
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gtgagcaaagcatttaaaccaattatggagaatacgctgaaatatttaaatgttggaaaa 
tctagtgatacttcatcaaaaactgactatagtaaagtgcctaacgtgcaaggagatgaa 
gttcaaaaagcagaggatagcgtcaatgctcaatctcttaaacctattacgattggtaat 
ggcaaacagattaaacaacaatcagttaagtcaggtaccaaagtcctaccacacagtaaa 
5 gtaatgttaatgacagacggggaattaacaatgccggatatgaccggatggacaaaggaa 
gatgtacttgcttttgaagatttaacgaaacttaaagtttctactaaaggtaatggattt 
gtcacgaatcaaagtatctcaaaaggtcaaatcattaaaaataaagataagatagaagtg 
teat tat ctgctgaagatacggatgatgaccaagagaaaactgatgaggactcttcggat 
aacaaatcaaagaaagataaagctgatgaggatcattcaaatacatcttcgtcaactaag 
10 aatgataagtcaaacgccgactcgaaaaatgattctgatgacagcacaaatgaaacatca 
ggttctgagagaaataattaa 

Sequence 652 

VLRFSYVMITGHSNGQDLIMKANEKYLVKNSQQPERGKI YDRNGKVLAEDVERYKLVAVV 
15 DKKASKESKKPRHVVDKKKTAKKLAEI IDMDADEIEKRLNNKKAFQIEFGQKGTNLTYQE 
KEKIEKMKLPGIALYPETERFYPNGNFASHLIGMAQKDPDTGELNGALGVEKI FNSYLNG 
SRGALKYIHDIWGYIAPNTKKEQQPKRGDDVHLTIDSNIQVFVEEALDDMVERYAPKDLF 
AVVMDAKTGEILAYSQRPTFNPETGKDFGKKWANDLYQNTYEPGSTFKTYGLAAAIQEGK 
FKPDEKYKSGHRNIMGSEISDWNKTGWGRIPMSLGFTYSSNTLMMHLQDLVGADKMKSWY 
20 ERFGFGKKTGGMFDGEAAGNIGWANELQQKTSAFGQSTTVTPAQMIQAQSAFFNKGNMLK 
PWFVSSIDNPITKKNYYSGKKEFVGKPVTEETANKVEEELDKVVNSKKSHAMNYRVKGYD 
IEGKTGTAQVADSNGGGYVKGENPYFVSFMGDAPKKNPKVIVYAGMSLAQKNDQEAYEMG 
VSKAFKPIMENTLKYLNVGKSSDTSSKTDYSKVPNVQGDEVQKAEDSVNAQSLKPITIGN 
GKQIKQQSVKSGTKVLPHSKVMLMTDGELTMPDMTGWTKEDVLAFEDLTKLKVSTKGNGF 
25 VTNQSISKGQI IKNKDKIEVSLSAEDTDDDQEKTDEDSSDNKSKKDKADEDHSNTSSSTK 
NDKSNADSKNDSDDSTNETSGSERNN+ 

Sequence 653 

Contig_0501_pos_5286_4 399 / 
30 is similar to (with p-value 0.0e+00) 

>gp:gp| AB007500 | AB0075004 Staphylococcus aureus genes for p 
enicillin-binding protein 1, MraY, MurD, partial and complet 
e cds. NID: g2463558. 

atgaagtttggacaaagtatccgtgaggaagggcctcaaagccatatgaaaaaaacaggt 
35 actcctactatgggtgggcttacatttttaattagtattataattacctctatcattgca 
attatctttgtagaccattcaaatccaattattttgttactatttgtaacaatcggtttt 
ggtcttattggatttattgatgactatattattgtagttaaaaagaataaccaaggatta 
actagtaaacaaaagtttctagcacaaataattattgcagttatattctttgtgctaagt 
gatgtatttcaccttgtgcattttacgacagatttgcatattccatttgtgaattttgat 
40 attccgt tgtcatttgcttatgtgatatttatcgtcttttggcaagttggtttctcaaat 
gctgtaaacttaactgatggtttagatggattggcaactggtttgtcaataataggtttt 
gcaatgtatgctgtaatgagttacatgttagattcaccggctattggcatattttgtatt 
ataatgattttcgctttactaggtttcttaccttacaatttaaatccagcgaaagttttc 
atgggagacacaggaagtcttgctctaggtggtatttttgcaacgatttcaatcatgttg 
45 aatcaagaattatcattaatattaattggttttgtgtttgtagttgagacattatctgta 
atgttacaagtagcctcatataaattaacgaagaaacgtattttcaagatgagtcctata 
catcaccacttcgaattaagtggttggggtgaatggaaagtagtaacagtattttggacg 
gtaggtttaattacgggattaataggtttatggattggagtgcattaa 

50 Sequence 654 

MKFGQSIREEGPQSHMKKTGTPTMGGLTFLISII ITSIIAI IFVDHSNPIILLLFVTIGF 
GLIGFIDDYI I WKKNNQGLTSKQKFLAQIIIAVIFFVLSDVFHLVHFTTDLHIPFVNFD 
I PLSFAYVI FI VFWQVGFSNAVNLTDGLDGLATGLSI IGFAMYAVMSYMLDSPAIGI FCI 
IMIFALLGFLPYNLNPAKVFMGDTGSLALGGIFATISIMLNQELSLILIGFVFVVETLSV 

55 MLQVASYKLTKKRIFKMSPIHHHFELSGWGEWKVVTVFWTVGLITGLIGLWIGVH* 

Sequence 655 

Contig_0501_pos_4235_304 8, 

is similar to (with p-value 0.0e+00) 
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>gp:gp|AF009671 |AF009671_1 Staphylococcus aureus UDP-N-acety 
lmuramoyl-L-alanine : D-glutamate ligase (murD) gene, comple 
te cds. NID: g2305091. 

atgggcattgaggtaattagcggtagtcatcctttttctttattagatgatgatcctatc 
5 attgtgaaaaacccaggtattccatatactgtatcaattattaaagaagcagcaaataga 
gggcttaaaatcttaacagaggttgaacttagctatttaatttctgaggcaccaatcata 
gcagttactggaactaacggtaaaactactgtcacttcactaatcggtgatattttccaa 
aaaagcgtgttgactggacgactttctgggaatattggttatgtagcctcaaaagttgca 
caagaagttaaatcagatgagtatttaataacagaattatcatcttttcaattattaggc 

10 attgaagaatataaaccacatatcgctatcattactaatatttattctgcacatttggat 
taccatgaaacgttagagaactatcaaaatgctaaaaagcaaatatataaaaatcaaact 
aaagatgattatctcatttgtaattatcatcaaagacacctaattgaatcagaaaatcta 
gaagcgaaaacattttatttttcaacacagcaagaagttgatgggatatacattaaagat 
ggtttcattgtttttaacggcattcgcattattaacactaaagacttagtgctaccagga 

15 gaacataacctggaaaatattttagcagctgttctagcatcaatcattgctggagtgcca 
gtcaaagctattgtagatagtcttgttactttttccggtattgatcatagacttcagtat 
attggtacaaatcgcacaaataaatattataatgattcaaaagcaactaatactttagct 
actcaatttgcgcttaactcttttgatcaaccaattatttggttgtgtggtggattagat 
cgtggtaatgaattcgatgaacttattccttatatggaaaatgtacgtgtgatggttgtt 

20 tttggagaaacacaagataaatttgctaaattgggaaatagtcaaggtaagtatgtgatt 
aaagcaacagatgtagaggatgctgttgataaaattcaagatatagtcgagccaaatgat 
gttgttctattatcaccagcttgtgcaagttgggatcagtatcatacatttgaagaacgt 
ggtgagaagtttatcgatagattccgagcgcacttgcca teat act aa 

25 Sequence 656 

MGIEVISGSHPFSLLDDDPIIVKNPGI PYTVSIIKEAANRGLKILTEVELSYLISEAPI I 
AVTGTNGKTTVTSLIGDIFQKSVLTGRLSGNIGYVASKVAQEVKSDEYLITELSSFQLLG 
IEEYKPHIAI ITNI YSAHLDYHETLENYQNAKKQI YKNQTKDDYLICNYHQRHLIESENL 
EAKTFYFSTQQEVDGIYIKDGFIVFNGIRIINTKDLVLPGEHNLENILAAVLASIIAGVP 

30 VKAIVDSLVTFSGIDHRLQYIGTNRTNKYYNDSKATNTLATQFALNSFDQPIIWLCGGLD 
RGNEFDELIPYMENVRVMVVFGETQDKFAKLGNSQGKYVIKATDVEDAVDKIQDIVEPND 
VVLLSPACASWDQYHTFEERGEKFIDRFRAHLPSY* 

Sequence 657 
35 Contig_0501_pos_3038_1620, 

is similar to (with p-value 3.0e-89) 

>gp:gp|U94706|SAU94706_7 Staphylococcus aureus strain ATCC 8 
325-4 cell wall/cell division gene cluster, yllB, yllC, yllD 
, pbpA, mraY, murD, divlB, ftsA and ftsZ genes, complete cds 

40 . NID: g2149889. 

atgctcatggaagagaataaaaatcaacctaataaggagaatatgtcgaataaagacgat 
aatgcaactcatttgaatgacagtcacagaaatgaagatttagagctttttagacggaat 
aaaaacgctcgccaacgcagaagacgtcgtatagataaccaaagtaaagaaaaggatgct 
acgtctacacaatcacagttagaaactaaaccaatggataaatttttagataatcacaag 

45 tcgcataatcaaaacaaagaaataaaaagtgacttaattgaagaaaatgttaatgatgaa 
aacgacaatcaaaaaaatattaatgataaattaaatgaccgtattgtccaagaaacaaat 
gaaagtcgtcaaagtactgaagacgatgaggaatttctacttgatcatcggagtgaacaa 
caacctaaagcctctcgt cat tctaaaaagcataaattactaagtaaatt tact tctaaa 
aaagaaaaggaaacatctacatcgttcaatagtaatgagaaggtaactcaaattaaaccg 

50 cttagtttagaagaaaaaagagccataagacgtaaaaagcaaaaaagaatccaatatacc 
attatcacactactcattcttatcattgttctcattttactctatatgtttacaccactg 
agtaaaatatcaaatgtaaatgttaaaggtaataacaacgtaagtacgagtaaaataaag 
aaagaacttaacgttacttcgcgatcacgaatgtatacttttagtaaaaataaagcgatt 
aggaacttaaaacagaatcctttaatcaaagaagttgatattcataaacaattaccaaac 

55 acgttaactgtcaacgtgactgagtaccaaattgtcggtttagaaaaaaataaagataaa 
tatgtgccaattatagaagatggtaaagaattaacagaatacaaagatgaagtgtcacat 
gatgggcctatcattgatggtttcaaaggagacaaaaaaacacgaattataaaagcttta 
tcagaaatgtcacctaaagtgagaaacttaattgcagaggtgagttacgcaccaactaaa 
aataaacaaagtcgcataaaaatcttcaccaaagataatatgcaagttattggtgacatt 
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acaacgattgcagacaaaatgcaatattatcctcaaatgtcacaatcattaagcagagat 
gactctggcgaacttaagacaaatggctatattgatttatcggttggagcgtcatttatt 
ccttatcaaggttcatcaactgttcaatcgggtacagaacaaaatgtaaccaagtcaaca 
caagaagaaaatgatgcaaaagaagaacttcaaaatgtgttgaataaaataaataaacaa 
5 tctaccagtggcgaaggcgactttctggtctgtaactga 

Sequence 658 

MLMEENKNQPNKENMSNKDDNATHLNDSHRNEDLELFRRNKNARQRRRRRIDNQSKEKDA 
TSTQSQLETKPMDKFLDNHKSHNQNKEIKSDLIEENVNDENDNQKNINDKLNDRIVQETN 

10 ESRQSTEDDEEFLLDHRSEQQPKASRHSKKHKLLSKFTSKKEKETSTSFNSNEKVTQIKP 
LSLEEKRAIRRKKQKRIQYTIITLLILIIVLILLYMFTPLSKISNVNVKGNNNVSTSKIK 
KELNVTSRSRMYTFSKNKAIRNLKQNPLIKEVDIHKQLPNTLTVNVTEYQIVGLEKNKDK 
YVPI IEDGKELTEYKDEVSHDGPIIDGFKGDKKTRI IKALSEMSPKVRNLIAEVSYAPTK 
NKQSRIKIFTKDNMQVIGDITTIADKMQYYPQMSQSLSRDDSGELKTNGYIDLSVGASFI 

15 PYQGSSTVQSGTEQNVTKSTQEENDAKEELQNVLNKINKQSTSGEGDFLVCN* 

Sequence 659 

Cont ig_05 0 l_pos_l 2 07_8 6 3 , 

putative peptide of unknown function 

20 gtgacaaaccggaggaaggtggggatgacgtcaaatcatcatgccccttatgatttgggc 
tacacacgtgctacaatggacaatacaaagggtagcgaaaccgcgaggtcaagcaaatcc 
cataaagttgttctcagttcggattgtagtctgcaactcgactatatgaagctggaatcg 
ctagtaatcgtagatcagcatgctacggtgaatacgttcccgggtcttgtacacaccgcc 
cgtcacaccacgagagtttgtaacacccgaagccggtggagtaaccatttggagctagcc 

25 gtcgaaggtgggacaaatgattggggtgaagtcgtaacaaggtag 

Sequence 660 

VTNRRKVGMTSNHHAPYDLGYTRATMDNTKGSETARSSKSHKVVLSSDCSLQLDYMKLES 
LVIVDQHATVNTFPGLVHTARHTTRVCNTRSRWSNHLELAVEGGTNDWGEVVTR* 

30 

Sequence 661 

Cont ig_0502_pos_1097_l 513, 

putative peptide of unknown function 

atgcatgttggctaccaggcacgcaaaataacatatcaaaatcattcctataagatacac 
35 tttgataatggacatattgttcactctcatactgaaccgattattgcaactggatttgat 
gttacccaaaaccctttaatagaacaactatttcaagtacgacaatcagaagttcaatta 
acagaattagacgaatctacaaagtttcctaatgtatttttaattggggcgactgtacgt 
catcaaaatgccattctttgttatatatataaattcagagcacgttttgcagtattagca 
cgcatagtaagcctacgcgaaggcttacctgaagatacatcattaattcagtcgtatcgt 
40 caaaaaaatatgtttctagacgattatagttgttgtgatgtgaattgcacatgttaa 

Sequence 662 

MHVGYQARKITYQNHSYKIHFDNGHIVHSHTEPIIATGFDVTQNPLIEQLFQVRQSEVQL 
TELDESTKFPNVFLIGATVRHQNAILCYIYKFRARFAVLARIVSLREGLPEDTSLIQSYR 
45 QKNMFLDDYSCCDVNCTC* 

Sequence 663 

ContigJ)502_pos_1702_2889, 

putative peptide of unknown function 

50 atgattaaaaagttagagacttttgttaaggagagcaaagcataccatcctcaatttcac 
aatattatttgtggatggtgctttattgtacttatgtttgctttacccatctacttgtct 
tatcacgtttcagacggattacaatactacgtgtcacactggttaactaaactatcccaa 
atatcattatttcaagaaaataagctacaacatattttattcggtcattatggtgtcatt 
tctttaggtacatattcattcgtctgggcgttaccagttgttttcatgattagtttatct 

55 acagctcttatagatataactcatttaaagcattatatcgtttggtctatcgaacctacg 
atgatgagacttggtcttcatgaatcagatatcatacccttgttagaaggatttggatgt 
aatgctgctgctat tact caagctacacatcaatgt cat cgttgtacaaaagtacaatgt 
atgagcttggtaagtttcggaactgcatgtagttatcaaattggtgctacattatcgata 
ttcaacgcaagccaccgctcttggttgtttttgccatacataggcatggttttcttagga 
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ggaatcatacataacaaactatggtatagtcatcaaacacctatgacaacacagtctgtt 
tttcaacgacaacctgtacgttggcccaaaccaaagctactcttaaaagcagcgtggaaa 
agtattcaaatgtttattgtacaagccttacctatttttataggaatttgccttattgta 
agtctattgtctcttacgtctattttgacttttatatcaaatgcattcatacctttatta 
5 tggctactagatgtacctacacagcttgcaccaggtattctgttttcaatgatacgtaag 
gatgggatgttgttgtttaatatgaatggcggtactttaattcaaagactttccgcattc 
caattattgttgctagtcttttttagttcaacatttacagcatgttcagtaacaatgact 
atgctcatgcgtcgactcggttcaattctaggaattaaaatgataatgaaacaaatggta 
tcgtccacaatttgcgt caeca tactagccatagcaatgttaagcataactaaaatttca 
10 gacttaggagtgatgttatggaaatcattattatcggtggttttttag 

Sequence 664 

MIKKLETFVKESKAYHPQFHNIICGWCFIVLMFALPI YLSYHVSDGLQYYVSHWLTKLSQ 
ISLFQENKLQHILFGHYGVISLGTYSFVWALPWFMISLSTALIDITHLKHYIVWSIEPT 
15 MMRLGLHESDIIPLLEGFGCNAAAITQATHQCHRCTKVQCMSLVSFGTACSYQIGATLSI 
FNASHRSWLFLPYIGMVFLGGI IHNKLWYSHQTPMTTQSVFQRQPVRWPKPKLLLKAAWK 
SIQMFIVQALPIFIGICLIVSLLSLTSILTFISNAFIPLLWLLDVPTQLAPGILFSMIRK 
DGMLLFNMNGGTLIQRLSAFQLLLLVFFSSTFTACSVTMTMLMRRLGSILGIKMIMKQMV 
SSTICVTILAIAMLSITKISDLGVMLWKSLLSWF* 

20 

Sequence 665 

Contig_0502_pos_2964_3740, 

putative peptide of unknown function 

atgaatgaatttggaaaaagaagtgttgatggccaacttatagaacatcctgaagtacct 
25 atgagtgaaatcactgaaggatgcatttgttgtgcgatgaaatcagacgtatcacaacaa 
ctacatgaactatacttaaaatatcaaccagatatcatctttattgaatgcagtggtgta 
gctgaaccactagctgtcgtcgatgcattcttcacacccgtacttgcaccttttatcact 
ttaaggagtatggtgggaattattgatgcaagcatgtattcacgaattaaatcttatcca 
caagacattgcagctctattttatgaacaacttcgtcattgttcgactttatttgttaat 
30 aaaatagataagatagaggtggaagaaaccgcccgcttgctacgtcaactcgagcgtctc 
aatagcgatgccaatattcaagttggtcaatttggagaattaaatttaaaatcactgcta 
gagccaacacatataaattcaaatgcatgtggcactttgcatagtaatataaatcatcaa 
t tcatcgaaaatcctaggctacaaacaaaagaagaaatgattagtgcgttagataacttg 
■ cctcaagatgtttaccgtgtcaaagggtttgttcgtttttcagatcagcaacacgtttat 
35 ttagtacagtatgcacaaggaaatatagaattatctcccattcaacttaaaaacgatgta 
ccattgtacctcattgttataggaaaacatttaaaacaaatacaatttgatttataa' 

Sequence 666 

MNEFGKRSVDGQLIEHPEVPMSEITEGCICCAMKSDVSQQLHELYLKYQPDIIFIECSGV 
40 AEPLAVVDAFFTPVLAPFITLRSMVGI IDASMYSRIKSYPQDIAALFYEQLRHCSTLFVN 
KIDKIEVEETARLLRQLERLNSDANIQVGQFGELNLKSLLEPTHINSNACGTLHSNINHQ 
FIENPRLQTKEEMISALDNLPQDVYRVKGFVRFSDQQHVYLVQYAQGNIELSPIQLKNDV 
PLYLIVIGKHLKQIQFDL* 

45 Sequence 667 

Contig_0502_pos_7 308_84 95, 

putative peptide of unknown function 

atgaagaaaaaattaagttatatgattaccattatgcttgcttttacgctaagtttagca 
cttggcctatttttcaatagtgctcacgccgactcgttaccacaaaagaatggtgcaaac 

50 caaaaaacaactaaagtcactgtcagtaataaagacgttccagatgcagttcgcaaactt 
gctgaagaacaatatttatctcgtgtagctttattagataaagcttccaaccacaaagca 
acatcgtatacacttggtgaaccttttaaaatttataaatttaataaggaaagcgacggc 
aattattattatccagtgctcaataaaaaaggagatgtcgtttatgtagtaacaatttct 
cctaatccttcaaattctaaagcttcaaaacagcaaaacaattattccattaatgtttct 

55 ccatttctttctaaaatattaaaccaatataaaaatcaaaagataacaattttgactaat 
acaaaaggatattttgcacttactgaagatggtaaagtgacacttgtgcttaaaacgcca 
cgtaataatgaaaaaacatatgaaaatgccactgaatccactaaacctaaagatttaaat 
gattttaaacaaactgcatcagtaacaaaaccaactttagaatatcaaagtacacgaaat 
gaaatgtacgcagaatatgtaaatcaattaaagaatttcagaatacgagaaacacaaggg 
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tataatagttggtgtgccggctataccatgtcagcactattcaatgccacatataataca 
aatcgatataatgcagaatcagtaatgagatatttacatcctaatttaagaggtcacgac 
ttccaatttacaggactaacatctaacgagatgcttcgttttggtagatcacaaggcaga 
aatactcaatatcttaatagaatgacttcatataatgaagtagaccaattaacaactaat 
5 aatcaaggtatagctgtattaggtaagcgtgttgaatcaagcgatggtattcacgctgga 
catgccatggctgtggctggtaatgctaaagttaacaacggacaaaaagtcatt ttaatt 
tggaacccatgggacaatggtctcatgactcaagatgcacatagtaatatcattccagta 
tcaaatggcgatcactatgaatggtatgcatcaatttatggttattaa 

10 Sequence 668 

MKKKLSYMITIMLAFTLSLALGLFFNSAHADSLPQKNGANQKTTKVTVSNKDVPDAVRKL 
AEEQYLSRVALLDKASNHKATSYTLGEPFKI YKFNKESDGNYYYPVLNKKGDVVYWTIS 
PN PS N S KAS KQQNN Y S I NVS P FLSKI LNQ YKNQK I T I LTNT KG Y FALT E DGKVT LVLKT P 
RNNEKTYENATESTKPKDLNDFKQTASVTKPTLEYQSTRNEMYAEYVNQLKNFRIRETQG 

15 YNSWCAGYTMSALFNATYNTNRYNAESVMRYLHPNLRGHDFQFTGLTSNEMLRFGRSQGR 
NTQYLNRMTSYNEVDQLTTNNQGIAVLGKRVESSDGIHAGHAMAVAGNAKVNNGQKVILI 
WNPWDNGLMTQDAHSNI IPVSNGDHYEWYASI YGY* 

Sequence 669 
20 Contig_0502_pos_15222_15602 / 

putative peptide of unknown function 

gtgatttcatctactggttgttttgttattttttctgttggttcaccttcgccaactttt 
tcccctgttaatgggttcttagttgttggtgttgtaattgtttttgttcctggttcacct 
ttctgtttaacacgctcttcacctggttttaaatcaggattgaattcacgtttcttgtcg 
25 aatggaatttcttccgttgacgtaatcgaatctccatcaactggaccatattttgtcaca 
tcatccactggtggtgtgactacttcgcctgtatcaggatttttaactcctggtttacct 
ggaacgtcctcttggctacctttcggtgcgtttggatcaaattcatccttatggcctggc 
ttgatttcttcgccaccataa 

30 Sequence 670 

VISSTGCFVIFSVGSPSPTFSPVNGFLVVGVVIVFVPGSPFCLTRSSPGFKSGLNSRFLS 
NGISSVDVIESPSTGPYFVTSSTGGVTTSPVSGFLTPGLPGTSSWLPFGAFGSNSSLWPG 
LISSPP+ 

35 Sequence 671 

Contig_0502_pos_11871_107 62, 

is similar to (with p-value 2.0e-93) 

>sp:sp| P53555 IBIOABACSU ADENOSYLMETHIONINE-8-AMINO-7-OXONON 
ANOATE AMINOTRANSFERASE (EC 2.6.1.62) (7, 8-DIAMINO-PELARGONI 

40 C ACID AMINOTRANSFERASE) (DAPA AMINOTRANSFERASE). >gp:gp|AF0 
08220 | AF008220_7 4 Bacillus subtilis rrnB-dnaB genomic region 
. NID: g2293135. >gp: gp I U51868 | BSU51868_2 Bacillus subtilis 
biotin biosynthetic operon genes, complete and partial cds. 
NID: gl277024. >gp: gp I 299119 I BSUB0016_96 Bacillus subtilis c 

45 omplete genome (section 16 of 21): from 2997771 to 3213410. 
NID: g2635411. 

atgctattaggttcgtctaatattccatcgattgagctagccgaacagttagtcaaatta 
acaccagatagattacaaaaagtgttttactctgatacagggagtgcgtcggtagagatt 
gctattaagatggcttatcaatattggaagaatatcgatgctgaacgatatgcgaagaag 

50 aataaatttcttacattacatcatggatatcatggagatacaataggttctgttagcgtt 
ggtggtatcgatagtttccacaaaatttttaaagaccttatttttgaaaatatacagata 
gaaacaccgtgtttatataaaagtaagtaccgeaatgaagcggaaatgcttaattcaata 
ctgaatcaaattgaaaatatattatctgagagaaatgatgaaatagtaggatttat tcta 
gagccacttatacaaggtgcaacaggtttattcgttcatccgcatggttttttgaaagct 

55 gtagaacagttatgtagaaaatatgatgtattactaatttgtgatgaagtagcggttggt 
ttcggacgtacgggagaaatgtttgcttgtaaccatgaagatgtacaaccagatattatg 
tgtctgggtaaggcgattacaggtggttatttaccgttagcggcaactttaacatctcaa 
aagatatatgatgcttttttaagtcagagtcacggtaagaatacgtttttccacggtcat 
acatatacaggtaatcagttagtttgttccgtagcacttgagaatattaatctttttaaa 
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aagaagcatctgattgggcacattcaaaagacatctcaaacattaaagcaacgcttagag 
gcacttcaacctcataaaaatattggagatattagagggcggggattaatgtatggtgtg 
gaattagttgaaaacaaatcaacgcagacaccactcgatattccaactgtagaactgatt 
atacatcgatgtaaagagaatggattgatgattcgtaatttggaaaatgtcatcactttc 
5 gtacctattttaagtatgtctaataaagaaattaaaaaaatggttaaaattttcaacaaa 
gccttacatcaaacattgggtaagaagtaa 

Sequence 672 

MLLGSSNIPSIELAEQLVKLTPDRLQKVFYSDTGSASVEIAIKMAYQYWKNIDAERYAKK 
10 NKFLTLHHGYHGDTIGSVSVGGIDSFHKIFKDLIFENIQIETPCLYKSKYRNEAEMLNSI 
LNQIENILSERNDEIVGFILEPLIQGATGLFVHPHGFLKAVEQLCRKYDVLLICDEVAVG 
FGRTGEMFACNHEDVQPDIMCLGKAITGGYLPLAATLTSQKIYDAFLSQSHGKNTFFHGH 
TYTGNQLVCSVALENINLFKKKHLIGHIQKTSQTLKQRLEALQPHKNIGDIRGRGLMYGV 
ELVENKSTQTPLDIPTVELIIHRCKENGLMIRNLENVITFVPILSMSNKEIKKMVKIFNK 
15 ALHQTLGKK* 

Sequence 673 

ContigJD502jpos_9610_8 939, 

is similar to (with p-value 2.0e-30) 

20 >sp:sp|P53559|BIOWJ3ACSU 6-CARBOXYHEXANOATE— COA LIGASE (EC 
6.2.1.14) (PIMELOYL-COA SYNTHASE). >gp: gp | U51868 I BSU51868__1 
Bacillus subtilis biotin biosynthetic operon genes, complete 

and partial cds. NID: gl277024 . 
atgcgtgcaagccacgaagatattcatattagtggtgctgaaacaatgtgtgaatttgag 

25 gatt tagaaaattatttaaaaaaatat tttaataaagcatttaatcatgaaaatggaaat 
atagatttcttaaatttgaaaattgaaaaggttaaggcaccgattcaaacgttagtagca 
Ltaccagtggttgaaaatctaaacgatactttaacacaattagcaaaacaaacaggtgtt 
tctgaatatgcgctaaacaaagggttagaatttataaaaaatgatattacttatactgga 
gccattattctatctgcacaaaccggacaacgacttgatagcactgaacaacgaggtatc 

30 agggtaacacaattagcatttaaaacatgcaaatgtaatggagaaatatcagaaagagta 
aaagatgcacgtgcacttgcaacttgtatcaatgcatttgaaggtgtaaaggcagaacta 
tgtgtatcagacgatttgcattacacgactggatattttgcgtcgcctaagttaggatat 
cgtagaatctttaatattaaagaaaaaggtacgcgtcacggaggaagaattatcttcgta 
gacgaagaaataaatttaaatgaatatgtttcctttttagaaacagtacctaaagaaatc 

35 atagaaaaataa 

Sequence 674 

MRASHEDIHISGAETMCEFEDLENYLKKYFNKAFNHENGNIDFLNLKIEKVKAPIQTLVA 
LPVVENLNDTLTQLAKQTGVSEYALNKGLEFIKNDITYTGAIILSAQTGQRLDSTEQRGI 
40 RVTQLAFKTCKCNGEISERVKDARALATCINAFEGVKAELCVSDDLHYTTGYFASPKLGY 
RRI FNIKEKGTRHGGRI IFVDEEINLNEYVSFLETVPKEI IEK* 

Sequence 675 

Contig_0502_pos_6828__4 888, 

45 is similar to (with p-value 0.0e+00) 

>gp:gp|AF090142|AF090142_l Staphylococcus epidermidis lipase 

precursor (gehD) gene, complete cds. NID: g3789931. 
gtgatttttttgaaaaataataatgaaacaagaagatttagcattaggaagtacacggtg 
ggagtcgtgtcaatcattactgggattacaatatttgtcagtggtcagcatgctcaagct 

50 gctgaaatgacacaatcatcatcagattttaacgaacagtcacaacaaacagaacaagtt 
gaacacaaagaagatacaactcatttatcatacgaattgaatcaagagggtgacacagct 
agccaatcaaagactaatcaagagaaccaatctgatgaaaatgtacaaaaaaagaataat 
caa act caacaagattcaacacaaacgt caeca ttaaatgaccaagaacaaactttaaag 
gggcaacaatcaaaagacaatcatgttaccccaaattcacgtcaggatacatatccaaaa 

55 ggccaaaatcaagatgataaaggcaaacaacagtttaaagataatcaacactcacaaaca 
gaacatcaacctaatactcaaaaccaaaataatgatcaagattcatcagataaaaagcaa 
cacccatctgatcaaactcaagccccatcttcaaaaggaacacaacctaaacaatcacag 
tctataggagatagagataaaacagtaaaacaaccatcttctaaagtacacaaaataggt 
aatacaaaaactgataaaacagttaaaacaaatcaaaaaaagcaaacatcattaacttca 
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ccacgcgttgtgaaatcaaaacaaactaaacatatcaatcaacttactgcgcaagctcaa 
tataaaaatcaatatccagtcgtgtttgtacatggatttgtaggtttagtcggtgaagat 
tcattcagcatgtacccaaattattggggtggtactaaatataacgtgaaacaagaactt 
acaaaattaggttaccgagttcacgaagccaatgtaggagcatttagcagcaattatgac 
5 cgtgctgttgaactgtattattatattaaaggtggaagagtagattatggtgcagcacat 
gctgcaaaatatggtcacaagcgttatggcagaacatatgaaggcatcatgcctgattgg 
gaaccaggtaaaaagatacatcttgttggacatagtatgggtggccaaacgatacgcttg 
atggaacattttttaagaaatggaaatcaagaagaaatagattatcaacgtcaatatggt 
ggtacggtatctgatttgtttaaaggtggacaagataacatggtgtctacgattactaca 

10 ttaggaacacctcataatggcacacctgctgcagataaactagggtcgactaaatttatc 
aaagatacaattaatagaattggaaaaattggtggaactaaagcgctcgatttagaacta 
ggtttttctcaatggggcttcaaacagaaacctaatgaatcatatgctgaatatgcaaaa 
cgtatagcgaatagtaaagtttgggagactgaagatcaggctgtaaatgatttaacaact 
gttggagcagaaaagttaaaccaaatgacgacattgaatcctaatatcgtctatacatca 

15 tatacaggtgctgcaacacatactggaccattaggcaatgaagtgccgaatattagacaa 
ttcccactattcgatttaacaagtcgtgtgataggtggagatgataataaaaatgtcaga 
gtaaatgatggcatagtacctgtgtcttcttcactacatccaagtgatgaagcatttaag 
aaggtaggtatgatgaacctagcaactgacaagggtatttggcaagtgagacccgtacaa 
tatgattgggatcatctagatttagtcggcttagatactactgattataagcgaactgga 

20 gaagaattaggtcaattctatatgagtatgataaataatatgttgaaagtcgaagagtta 
gatggtattacacgtaagtag 

Sequence 676 

VIFLKNNNETRRFSIRKYTVGVVSIITGITIFVSGQHAQAAEMTQSS5DFNEQSQQTEQV 
25 EHKEDTTHLSYELNQEGDTASQSKTNQENQSDENVQKKNNQTQQDSTQTSPLNDQEQTLK 
GQQSKDNHVTPNSRQDTYPKGQNQDDKGKQQFKDNQHSQTEHQPNTQNQNNDQDSSDKKQ 
HPSDQTQAPSSKGTQPKQSQSIGDRDKTVKQPSSKVHKIGNTKTDKTVKTNQKKQTSLTS 
PRVVKSKQTKHINQLTAQAQYKNQYPVVFVHGFVGLVGEDSFSMYPNYWGGTKYNVKQEL 
TKLGYRVHEANVGAFSSNYDRAVELYYYIKGGRVDYGAAHAAKYGHKRYGRTYEGIMPDW 
30 EPGKKIHLVGHSMGGQTIRLMEHFLRNGNQEEIDYQRQYGGTVSDLFKGGQDNMVSTITT 
LGTPHNGTPAADKLGSTKFIKDTINRIGKIGGTKALDLELGFSQWGFKQKPNESYAEYAK 
RIANSKVWETEDQAVNDLTTVGAEKLNQMTTLNPNIVYTSYTGAATHTGPLGNEVPNIRQ 
FPLFDLTSRVIGGDDNKNVRVNDGIVPVSSSLHPSDEAFKKVGMMNLATDKGIWQVRPVQ 
YDWDHLDLVGLDTTDYKRTGEELGQFYMSMINNMLKVEELDGITRK* 

35 

Sequence 677 

Contig_0502_pos_4703_3888, 

putative peptide of unknown function 

atgtatacaataatagagagatgtgaaaagatgaaatattatgggaagtgcatttcttac 
40 ataagcattttaatattaacgttttttattggcggatgtggatttatgaataaagaaaat 
aataaagaagcggaaattaaagaaaattttaataaaacattaagtatgtatccaattaaa 
aatttagaagatttatacgataaagaaggctatcgtgatgaagaatttgaaaaagaggac 
aaagggacatggattattaattcagaaatgaatattcagaaaaaagatcaagcgatgaaa 
tctagaggtatggttttgtatatgaatagaaatactagaaagacgactggtcatttttat 
45 acaaatataattacagaagataaaaaagggagagtgcacagtaaagataaagaatatccg 
gttcgccttaaaaacaataaaattgaaccgactaaacctatcgccgatgaaaaattaaaa 
aatgaaattaaaaactttcagtttttctctcaatatgggaattttaaaaatttaaaagac 
tacaagaatggaaatgtgtcttataacccaaacgtaccaagctattcggcagagtaccaa 
ttaagtaatgaagatgacaatgtgaagcaactcagaaagaggtatgatattccgattaag 
50 agagctcctaaactaatattaaaaggggacggtgaccttaaaggttcat ctataggttat 
aaagatatcgagttttcttttgtcgacaataaagaagaaagcgtctactttgcggatagt 
t tggaatttaatccaagtgaggtaaataatgagtag 

Sequence 678 

55 MYTI IERCEKMKYYGKCISYISILILTFFIGGCGFMNKENNKEAEIKENFNKTLSMYPIK 
NLEDLYDKEGYRDEEFEKEDKGTWIINSEMNIQKKDQAMKSRGMVLYMNRNTRKTTGHFY 
TNIITEDKKGRVHSKDKEYPVRLKNNKIEPTKPIADEKLKNEIKNFQFFSQYGNFKNLKD 
YKNGNVSYNPNVPSYSAEYQLSNEDDNVKQLRKRYDI PIKRAPKLI LKGDGDLKGSSIGY 
KDIEFSFVDNKEESVYFADSLEFNPSEVNNE* 
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Sequence 679 

Contig_0503_pos_4 4 33_4 82 8 , 
is similar to (with p-value 6.0e-30) 
5 >sp:sp|Q02499|KPYK_BACST PYRUVATE KINASE (EC 2.7.1.40) (PK) . 
>pir:pir | S2978 3 | S29783 pyruvate kinase (EC 2.7.1.40) (versi 
on 2) - Bacillus stearothermophilus >gp : gp I D13095 I BACPK_3 B. 

stearothermophilus phosphof ructokinase and pyruvate kinase 
genes. NID: g285620. 
10 atgcctttttctaaacctataattgcgcttggtgaagtaataccattttcttctgtaatt 
agacctatagctttttcaacatatggtactaatgtttcatcaacagaatttgtaataata 
actttatcagataaatctttaccttctaaatcactagcactatctgcgacaattgcatgg 
cctacaacagatcctctaccaacaccttggcctttagcaatctcatcacctactaagtgg 
attttcatcatatttgtagttcctttttctccagtaggtacaccagcagtaataataatt 
15 aaatctccgtttgaaactctaccagtttctactgctgttgctacagcattatttagtaaa 
gcatcagttgttttacgtccttctttaacgacctga 

Sequence 680 

MPFSKPIIALGEVIPFSSVIRPIAFSTYGTNVSSTEFVIITLSDKSLPSKSLALSATIAW 
20 PTTDPLPTPWPLAISSPTKWIFI I FWPFSPVGTPAVI I IKSPFETLPVSTAVATALFSK 
ASVVLRPSLTT* 

Sequence 681 

Contig_0503_pos_4 881_518 6 / 
25 is similar to (with p-value 4.0e-31) 

>sp:sp|P4 3659|SMPB_ENTFA SMALL PROTEIN B HOMOLOG. >gp:gp|M90 

060 1 STRATPASEA_1 Streptococcus faecalis H+ ATPase a (atpB),b 
(atpF),c (atpE), alpha (atpA) , beta (atpD) , gamma (atpG),delt 

a (atpH),and epsilon (atpC) subunits, complete cds . NID: gl5 
30 3565. 

gtgagacgaggcgaaatgtacctgaataatatgcatattgcaccatatgaagaagggaac 
cgttttaatcatgaccctttacgtacacgtaaattactcttgcacaaaaaagaaattcaa 
aaattaggtgagcgtacacgagaaataggttattctattattccgttgaagttatattta 
aaacatggtcaatgtaaagttttattaggcgttgctagaggtaaaaagaaatacgacaaa 
35 cgtcaagcacttaaagaaaaagcggtaaaacgagatattgatcgcgcagttaaagcccgt 
tattaa 

Sequence 682 

VRRGEMYLNNMHIAPYEEGNRFNHDPLRTRKLLLHKKEIQKLGERTREIGYSIIPLKLYL 
40 KHGQC KVLLG VARG KKK Y DKRQALKEKAVKRDI DRAVKAR Y * 

Sequence 683 

Contig_0503_pos_2851_1928, 

putative peptide of unknown function 

45 atgaagcctaaagtattgttagcaggtggcaccggctatattggtagatatttaagtcga 
gtcattgaacatgatgctcaattatttgctttatctaaatatcccaaacctgacaaagga 
tctacgaacaaaatcacatggttaaaacgcgatatatataatcataaagatgtagttgaa 
gctatgaaaggtatagacattgcggtatattatttagatccaactaaacattctgctaaa 
ttaacacatgcaacagcacgagatttaaactttatagcggcagataactttggtagagct 

50 gcatcaataaataagctgaaaaagattgtgtatattccggggagccgtcatgataatgaa 
gctattgaacgtttaggcgcttatggcgtaccagttgattgtacggatgttgaagtgaaa 
cgtcctcatattaacgtagaattacaaacagctaaatatgatgatgttcgaacagcgatg 
aagatgattttaccaaagaaatggacgctcaatcaacttgtagactattttagtaggtgg 
ttagatgagacaaaaggaacttttgtacatactcaaaaacaagatcatcactacatcatt 

55 tacaataggaacattaaaagacctttagctattttcaaaatggttaatacaacagaagat 
ataattacattacatcttgt tgatggtaaattgatgaaacctaaatcaaagaagcaagca 
aaattagaatttagacttcttaaaggaacacgtttaattatggttcatttatacgattat 
atccctagattattttggccaatttactatttcatacaagcaccgattcaaggacttctt 
atgagaggttttgaaattgattgtagaattaagcattatcaaggtcgcattcaatcaggt 
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gagaagattaaatatactaaataa 
Sequence 684 

MKPKVLLAGGTGYIGRYLSRVIEHDAQLFALSKYPKPDKGSTNKITWLKRDI YNHKDWE 
5 AMKGIDIAVYYLDPTKHSAKLTHATARDLNFIAADNFGRAASINKLKKIVYIPGSRHDNE 
AIERLGAYGVPVDCTDVEVKRPHINVELQTAKYDDVRTAMKMILPKKWTLNQLVDYFSRW 
LDETKGTFVHTQKQDHHYI I YNRNIKRPLAI FKMVNTTEDI ITLHLV DGKLMKPKSKKQA 
KLEFRLLKGTRLIMVHLYDYIPRLFWPIYYFIQAPIQGLLMRGFEIDCRIKHYQGRIQSG 
EKIKYTK* 

10 

Sequence 685 

ContigJD503_pos_1660_1235, 

is similar to (with p-value 3.0e-21) 

>sp:sp|P42421| YXDJ_BACSU HYPOTHETICAL 26.6 KD SENSORY TRANSD 
15 UCTION PROTEIN IN IDH 3 f REGION . >gp : gp | D14 399 I BACIOLO__l 1 Bac 
illus subtilis 15 kb chromosome segment contains the iol ope 
ron. NID: g709980. >gp : gp | Z99124 I BSUB002 1_70 Bacillus subtil 
is complete genome (section 21 of 21): from 3999281 to 42148 
14. NID: g2636442. 
20 atggatcaagtgatgagtatggaacttggtgcagatgattatatgcaaaaaccattttat 
acaaacgtcttaattgctaagctacaagctatttatagacgcgtttatgaatttggagtt 
gaagaaaagagaacgttaagttggcaagacgctactgtggatttatcaaaagatagtatt 
caaaaagatgataaaactatctttttgtctaaaacagagatgattattttagagatgtta 
atcaataaacgtaatcaaatcgtgacacgagacactctcattactgctttgtgggatgat 
25 gaagcttttgttagtgataatactttaacagttaatgttaatagattaagaaaaaaatta 
tcagaaattgacatggatagtgcaattgaaaccaaagttggtaaaggatacttagctcat 
gaataa 

Sequence 686 

30 MDQVMSMELGADDYMQKPFYTNVLIAKLQAIYRRVYEFGVEEKRTLSWQDATVDLSKDSI 
QKDDKTI FLSKTEMI ILEMLINKRNQI VTRDTLITALWDDEAFVSDNTLTVNVNRLRKKL 
SEI DMDSAI ETKVGKGYLAHE* 

Sequence 687 
35 Contig_0503_pos_831_202, 

putative peptide of unknown function 

atgaaattattgatagatcaagagaatgatgatcagcgtaagcgagcgttattatttgaa 
tggtctcgtattaatgagatgttagataagcaattatatttaacaaggcttgaaacacat 
catcgtgatatgtattttgattatatttcattaaagagaatggttatagatgaaatacaa 

40 gttactcgacatatcagtcaggcaaaagggataggttttgaattagattttaaagacgaa 
caaaaggtttatacagatgttaaatggtgccgtatgatgattaggcaagttctatctaac 
tctttgaaatatagtgataattctacaataaatttaagtggttataacatagaaggacac 
gttgttttaaaaattaaagactacggtcgtggaattagtaaaagagatttaccacgtata 
tttgatagaggatt tact tctacaacagaccgcaacga tact gcgtcttctggtatggga 

45 ttataccttgtacaaagcgtgaaagaacaacttgggattgaagttaaagttgattcaata 
gtggggaaaggaacaacgttttatttcattttcccacaacaaaatgaaatcattgagcgc 
atgtctaaagtgacaagattgtcattttaa 

Sequence 688 

50 MKLLI DQENDDQRKRALLFEWSRINEMLDKQLYLTRLETHHRDMYFDYISLKRMVIDEIQ 
VTRHISQAKGIGFELDFKDEQKVYTDVKWCRMMIRQVLSNSLKYSDNSTINLSGYNIEGH 
VVLKIKDYGRGISKRDLPRIFDRGFTSTTDRNDTASSGMGLYLVQSVKEQLGIEVKVDSI 
VGKGTTFYFIFPQQNEI IERMSKVTRLSF* 

55 Sequence 68 9 

Con t i g_0 5 0 5_pos_3 6 6 3_ 4 2 1 4 , 

is similar to (with p-value 3.0e-56) 

>gp:gp|AF012285| AF012285_31 Bacillus subtilis mobA-nprE gene 
region. NID: g3282109. >gp : gp | Z9911 1 1 BSUB0008_128 Bacillus 
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subtilis complete genome (section 8 of 21) : from 1394791 to 
1603020. NID: g2633699. 

atgataacaatgaaagatattataagagatggtcatccaacacttcgtgaaaaagcgaaa 
gaattaagcttcccactttctaacaatgataaagaaacattgcgcgcaatgcgtgaattt 
5 ctaatcaatagtcaggatgaagaaaccgcaaaacgttatggtttacgttctggcgtaggt 
ttagctgctccacaaattaatgaaccaaaacgtatgattgctgtctacttacctgatgat 
ggaaacggtaaatcgtatgattatatgctcgtaaatcctaaaataatgagttacagtgta 
caagaagcttatttaccaactggcgaaggttgtctaagtgttgatgaaaacatcccaggt 
ttagtgcatcgtcatcatagagtcactattaaagctcaagatattgatggaaatgatgtt 
10 aaattacgtctcaaaggctatcctgcaattgtatttcaacacgaaattgatcatctaaat 
ggcattatgttttatgattatattgatgccaatgaacctctaaaaccacatgaagaggcc 
gtagaagtctaa 

Sequence 690 

15 MITMKDI IRDGHPTLREKAKELSFPLSNNDKETLRAMREFLINSQDEETAKRYGLRSGVG 
LAAPQINEPKRMIAVYLPDDGNGKSYDYMLVNPKIMSYSVQEAYLPTGEGCLSVDENIPG 
LVHRHHRVTIKAQDIDGNDVKLRLKGYPAIVFQHEIDHLNGIMFYDYIDANEPLKPHEEA 
VEV* 

20 Sequence 691 

Contig_0505_pos_5404_6018, 

is similar to (with p-value 1.0e-66) 

>sp:sp|Q4 54 93 j YKQC_BACSU HYPOTHETICAL 61.5 KD PROTEIN IN ADE 
C-PDHA INTERGENIC REGION. >gp : gp | AF012285 | AF012285_29 Bacill 

25 us subtilis mobA-nprE gene region. NID: g3282109. >gp:gp|Z99 
111 | BSUB0008_125 Bacillus subtilis complete genome (section 
8 of 21): from 1394791 to 1603020. NID: g2633699. 
atggctcaattaggtcatgaaggtgtgttgtgcttactatcagactcaacaaacgcacta 
gttccagattttactttaagtgaacgtgaagttggacagaatgtcgataaaattttcaga 

30 aattgtaaggggcgtattatctttgcaacttttgcttctaatatttatcgtgttcagcaa 
gcagttgaagcagcaattaaatataatcgtaaaatcgttacatttggacgttcaatggaa 
aacaatatcaaaattggtatggaactaggatatat caaagcgccaccagaaacgtttata 
gaa cct aa t aaaataaatagtgtacctaaacacgagt tact cat tctttgt act ggttct 
caaggtgaacctatggctgcattatcaagaattgcaaatggtacacataagcaaataaaa 

35 attataccggaagacactgtagtat ttagttcttcgcctattccaggtaacactaagagt 
atcaatcgtacaattaatgcgttgtacaaagctggtgcagatgtgattcatagtaaaatt 
tcaaacatcttgcacaattttagatgctacttcgtttccccatatacaagctttttcaat 
actttggttatctag 

40 Sequence 692 

MAQLGHEGVLCLLSDSTNALVPDFTLSEREVGQNVDKIFRNCKGRIIFATFASNIYRVQQ 
AVEAAIKYNRKIVTFGRSMENNIKIGMELGYIKAPPETFIEPNKINSVPKHELLILCTGS 
QGEPMAALSRIANGTHKQIKIIPEDTVVFSSSPI PGNTKSINRTINALYKAGADVIHSKI 
SNILHNFRCYFVSPYTSFFNTLVI* 

45 

Sequence 693 

Contig_0505_pos_34 83_2857 f 

putative peptide of unknown function 

atgaattttaaaaagactgtagcaattgtcctaacgtcagcagtgttattagctggatgt 
50 actatagataaaaaagaaattaaaaaatatgatgatcaagtacaaaaagctatggaccaa 
gagaaaaccgttaatcaagtaagtaaaaaaataaacgaattagaagagaaaaagcaaaaa 
ttatttaaaaaggtaaatgataaagatcaaagcacacgtaaaaaagcagctgaagatata 
gttgaaaatgtaaaacaaagacaaaaagaatttgaaaaagaagagaaggctctagataat 
tctgaaaaagcatttaaacaagccaagcaatatcttgaacatgtagaaaacaaagcaaag 
55 aaaaaagaagttgaacaacttgatagtgctattaaagaaaaatataaatcacatgatgct 
tatgcaaaggcttacaaaaaagcacttaataaggaaaaagaactgttttcttatttgaat 
gaagataatgcaacacaatcggaagtagacggaaaatcgaaaga tctttctaaagcatat 
aaagaaatgaataataaatttaatgcttactcaaaagccattgagaaagtaaaaagagaa 
aaacaagatgtagaccaattaaaataa 



173 



WO 01/34809 



PCT7US00/30782 



Sequence 694 

MNFKKTVAIVLTSAVLLAGCTIDKKEIKKYDDQVQKAMDQEKTVNQVSKKINELEEKKQK 
LFKKVNDKDQSTRKKAAEDIVENVKQRQKEFEKEEKALDNSEKAFKQAKQYLEHVENKAK 
5 KKEVEQLDSAIKEKYKSHDAYAKAYKKALNKEKELFSYLNEDNATQSEVDGKSKDLSKAY 
KEMNNKFNAYSKAIEKVKREKQDVDQLK+ 

Sequence 695 

Contig_0505_pos_2686_1574, 

10 is similar to (with p-vaiue 0.0e+00) 

>pir : pir | S 10 7 98 I DEBSPF pyruvate dehydrogenase (lipoamide) (E 
C 1.2.4.1) alpha chain - Bacillus stearothermophilus >gp:gpl 
X53560iBSPDMC_3 B. stearothermophilus pdhA, pdhB, pdhC, pdhD 
genes for pyruvate dehydrogenase multienzyme complex (E.C. 

15 numbers 1.2.4.1, 2.3.1.12, 1.8.1.4). NID: g40038. 

atggctcctaagttacaagcccaattcgatgcagttaaagttttaaatgagactcaatcg 
aaatttgaaatggttcaaattttggatgaagacggaaatgtcgttaatgaagacttagta 
cctgatttaacagacgaacaattagtggaattaatggaaagaatggtatggactagaatt 
cttgatcaacgttctatttcgttaaatagacaaggacgtttaggtttctatgcaccaaca 

20 gcaggacaagaagcttcacaattagcatctcagtatgctttagaaagtgaagacttcatt 
ttacctggttatcgtgatgtgcctcagattatttggcatggcttacctcttacagacgca 
ttcttattctcaagaggacacttcaaaggtaaccaattccctgagggagttaatgcactt 
agccctcaaattattatcggtgcacaatatattcaaactgccggtgtagcgtttggactt 
aaaaaacgtggcaaaaatgcagt cgcaattact tat acaggtgatggtggt teat cacaa 

25 ggtgacttctatgaaggaattaactttgcatctgcatacaaagcacctgcaatttttgta 
attcaaaacaataactatgccatctctacaccacgtagtaaacaaacagctgcagaaaca 
ttagcacaaaaggctatttcagttggtatccctggaattcaagttgatggtatggatgct 
ttagctgtttatcaagcaacattagaagcacgtgaacgtgcagtagcaggagaaggtcct 
actgttatcgaaactttaacttatcgttatggaccacatactatggctggtgatgatcct 

30 actcgttatagaacttcagatgaagatgctgaatgggagaaaaaagacccattagtacgt 
t tcagaaaatatttagaagctaaaggtctttggaatgaagacaaagaaaatgaagtggtt 
gaacgtgcaaaatctgaaataaaagcagctattaaagaggctgacaatacagaaaaacaa 
actgttacttctctaatggatatcatgtatgaagaaatgcctcaaaatttagcagaacaa 
tatgaaatttacaaagagaaggagtcgaagtaa 

35 

Sequence 696 

MAPKLQAQFDAVKVLNETQSKFEMVQILDEDGMWNEDLVPDLTDEQLVELMERMVWTRI 
LDQRSISLNRQGRLGFYAPTAGQEASQLASQYALESEDFILPGYRDVPQIIWHGLPLTDA 
FLFSRGHFKGNQFPEGVNALSPQIIIGAQYIQTAGVAFGLKKRGKNAVAITYTGDGGSSQ 
40 GDFYEGINFASAYKAPAIFVIQNNNYAISTPRSKQTAAETLAQKAISVGIPGIQVDGMDA 
LAVYQATLEARERAVAGEGPTVIETLTYRYGPHTMAGDDPTRYRTSDEDAEWEKKDPLVR 
FRKYLEAKGLWNEDKENEWERAKSEIKAAIKEADNTEKQTVTSLMDIMYEEMPQNLAEQ 
YEIYKEKESK* 

45 Sequence 697 

Contig_0505_pos_1570_593, 

is similar to (with p-value 0.0e+00) 

>pir :pir | C36718 | C36718 pyruvate dehydrogenase (lipoamide) (E 
C 1.2.4.1) El beta chain precursor - Bacillus subtilis >gp:g 

50 p|AF012285| AF012285_34 Bacillus subtilis mobA-nprE gene regi 
on. NID: g3282109. >gp : gp| M57435 I BACPYDHY_3 B. subtilis pyruv 
ate dehydrogenase complex genes, complete cds; PAL-related 1 
ipoprotein (sip) gene, complete cds, lysine decarboxylase (c 
ad) gene, partial cds. NID: gl43375. >gp: gpl Z99111 I BSUB0008_ 

55 131 Bacillus subtilis complete genome (section 8 of 21): fro 
m 1394791 to 1603020. NID: g2633699. 

atggcacaaatgacaatggttcaagcgattaacgatgcgcttaaaagtgaactcaaaaga 
gacgaagacgttttagttttcggtgaagacgttggtgttaacggtggtgtattccgtgtt 
actgaaggtt tacaaaaagaatttggegaagatcgagtatt tgatacaccattagcagag 
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tctggaattggtgggcttgcactaggcttagcagtgactggcttccgtcctgttatggaa 
attcaattcttaggattcgtttatgaagtatttgacgaagtagctggtcaaattgctcgt 
actcgtttccgttcaggtggaactaaaccagcgcctgttacaattcgtacaccttttggt 
ggtggcgtccacactccagagttgcatgctgataatttagaaggtatcttagctcaatca 
5 cctggtttgaaagtagttattccatcaggtccttatgatgctaaaggattattaatttct 
tctattcaaagtaatgatccagttgtatatctagaacatatgaaattatatcgttctttc 
cgtgaagaggttcctgaagaagaatacaaaattgacattggaaaagccaatgttaaaaaa 
gaaggtaatgatattactctaatatcttacggggcaatggtacaagaatcactaaaagct 
gctgaagagttagaaaaagatggttat tcagttgaagttattgacttacgtactgtacaa 
10 ccaattgatatagatactttagtagcatcagttgagaaaactggacgtgctgtagttgta 
caagaagcacaacgtcaagctggtgtgggtgcacaagtggcagcagaattagcagagcga 
gcaattctttcattagaagctccaatagctcgagtagccgcatcagatacaatttatcca 
tttactcaagctgaaaacgtttggttaccaaataaaaaagatattatagagcaagctaag 
gcaactttagaattctaa 

15 

Sequence 698 

MAQMTMVQAINDALKSELKRDEDVLVFGEDVGVNGGVFRVTEGLQKEFGEDRVFDTPLAE 
SGIGGLALGLAVTGFRPVMEIQFLGFVYEVFDEVAGQIARTRFRSGGTKPAPVTIRTPFG 
GGVHTPELHADNLEGILAQSPGLKVVI PSGPYDAKGLLISSIQSNDPVVYLEHMKLYRSF 
20 REEVPEEEYKIDIGKANVKKEGNDITLISYGAMVQESLKAAEELEKDGYSVEVTDLRTVQ 
PIDIDTLVASVEKTGRAVVVQEAQRQAGVGAQVAAELAERAILSLEAPIARVAASDTIYP 
FTQAENVWLPNKKDI IEQAKATLEF* 

Sequence 699 
25 Con t i g_0 5 0 5_pos_4 6 2_1 5 1 , 

is similar to (with p-value 8.0e-44) 

>sp:sp|Q59821|ODP2_STAAU DIHYDROLI POAMIDE ACETYLTRANSFERASE 
COMPONENT (E2) OF PYRUVATE DEHYDROGENASE COMPLEX (EC 2.3.1.1 
2). >pir:pir|S19722|S19722 dihydrolipoamide S-acetyltransf er 
30 ase (EC 2.3.1.12) chain E2 - Staphylococcus aureus >gp:gp|X5 
84 34 I SAPDHDNA_2 S. aureus pdhB, pdhC and pdhD genes for pyruv 
ate decarboxylase, dihydrolipoamide acetyltransf erase and di 
hydrolipoamide dehydrogenase. NID: g48871. 

gtggcatttgaatttagattacccgatatcggggaaggtatccacgaaggtgaaattgtt 
35 aaatggtttattaaagccggcgatacaattgaagaagatgatgtattagcagaagttcaa 
aatgataaatctgtagtagaaattccttctccagtaagtggtactgttgaagaagtgtta 
gtagatgaaggaacagtggcagtagtaggagatgtcatcgttaaaattgatgcacctgat 
gcagaagaaatgcaatttaaaggtcatggcgatgatgaggattctaagaaagaagaaaaa 
gaaatgatttga 

40 

Sequence 700 

VAFEFRLPDIGEGIHEGEIVKWFIKAGDTIEEDDVLAEVQNDKSVVEIPSPVSGTVEEVL 
VDEGTVAVVGDVIVKI DAPDAEEMQFKGHGDDEDSKKEEKEMI * 

45 Sequence 701 

Contig_0506_pos_1522_2664, 

is similar to (with p-value 0.0e+00) 

>sp:sp|Q4 54 93| YKQC_BACSU HYPOTHETICAL 61.5 KD PROTEIN IN ADE 
C-PDHA INTERGENTC REGION . >gp : gp | AF0 12285 I AF012285_29 Bacill 

50 us subtilis mobA-nprE gene region. NID: g3282109. >gp:gp|Z99 
111 I BSUB0008_125 Bacillus subtilis complete genome (section 
8 of 21): from 1394791 to 1603020. NID: g2633699. 
atggctcaattaggtcatgaaggtgtgttgtgcttactatcagactcaacaaacgcacta 
gttccagattttactttaagtgaacgtgaagttggacagaatgtcgataaaattttcaga 

55 aattgtaaggggcgtattatctttgcaacttttgcttctaatatttatcgtgttcagcaa 
gcagttgaagcagcaattaaatataatcgtaaaatcgttacatttggacgttcaatggaa 
aacaatatcaaaattggtatggaactaggatatatcaaagcgccaccagaaacgtttata 
gaacctaataaaataaatagtgtacctaaacacgagttactcattctttgtactggttct 
caaggtgaacctatggctgcattatcaagaat tgcaaatggtacacataagcaaataaaa 



175 



WO 01/34809 



PCTYUSOO/30782 



attataccggaagacactgtagtatttagttcttcgcctattccaggtaacactaagagt 
atcaatcgtacaattaatgcgttgtacaaagctggtgcagatgtgattcatagtaaaatt 
tcaaacattcacacttctggacacggttctcaaggtgatcaacaattaatgttacgtctg 
attcaacctaaatacttcctgccaattcacggtgaatatcgtatgcttaaagctcatggt 
5 gagactggtgttcaatgcggtgttgatgaagataatgtatttattttcgatatcggtgat 
gtacttgctttaacacatgattctgcacgaaaagcaggaagaattccttccggcaatgta 
cttgttgatggcagtggtataggtgatattggcaatgttgtcatcagagatcgtaaatta 
ctttcagaagaagggttagttattgttgttgtgagcattgactttaatactaacaaatta 
ctatctggccctgatattatttcacgcggttttgtttatatgcgggaatctggtcaatta 
10 atttatgatgctcaacgtaaaattaaaggcgatgtcatttctaaacttaacagcaataaa 
gatattcaatggcatcaaattaaatcttcaattatcgaaacattacatccttatctttat 
gaaaaaacagctcgaaaacctatgattttacctgtgataatgaaagtaaatgaagataaa 
taa 

15 Sequence 702 

MAQLGHEGVLCLLSDSTNALVPDFTLSEREVGQNVDKIFRNCKGRIIFATFASNI YRVQQ 
AVEAAIKYNRKIVTFGRSMENNIKIGMELGYIKAPPETFIEPNKINSVPKHELLILCTGS 
QGEPMAALSRIANGTHKQIKIIPEDTVVFSSSPIPGNTKSINRTINALYKAGADVIHSKI 
SNIHTSGHGSQGDQQLMLRLIQPKYFLPIHGEYRMLKAHGETGVQCGVDEDNVFIFDIGD 

20 VLALTHDSARKAGRIPSGNVLVDGSGIGDIGNVVIRDRKLLSEEGLVIVVVSIDFNTNKL 
LSGPDI ISRGFVYMRESGQLIYDAQRKIKGDVISKLNSNKDIQWHQIKSSIIETLHPYLY 
■ EKTARKPMILPVIMKVNEDK* 

Sequence 703 
25 Con t ig_0 5 0 7_pos_6 3 9_1 0 7 3 , 

putative peptide of unknown function 

atgaagcaagctttagaaaaatatttacaagcgaatagcgatgtacttgataataagtat 
gtcatgcaacataaattagataaacaaagtgatagtaatcctaaaatcacagaatcacaa 
gctgatcgtcttagcaagttatccaatttagcagttaagaacgatttacatttcaaaaaa 
30 tttataaaaaacaatcacatccctgaagaatataaagatccaacagatcgcataattaat 
tattttcacgctttaaatagtaccatttcaaatgtagatgaagacattgagaaattaaac 
taccaaccacaaaattcaattaacgttgttgatgtagccacaaaatattcaggtgatgta 
aataaaaaacaacaagataaaattactactttccttaagaaaaaaggaatagacacagaa 
gtatttaataaataa 

35 

Sequence 704 

MKQALEKYLQANSDVLDNKYVMQHKLDKQSDSNPKITESQADRLSKLSNLAVKNDLHFKK 
FIKNNHIPEEYKDPTDRIINYFHALNSTISNVDEDIEKLNYQPQNSINVVDVATKYSGDV 
NKKQQDKI TTFLKKKGI DTEVFNK* 

40 

Sequence 705 

ContigJD507_pos_1278_1670, 

putative peptide of unknown function 

atgaggaaaatcattatgaagatacgtttaacatttattatcttagcaatactatccacc 
45 atcggcttagtacttgttttagcaaaatatccaacaggcccacacacaatcaactataac 
gaaccttatacagtactcatagccattacgacaatagttataatggctttaccagcactc 
atattaggtatatttaatcatcttgcatgtagaatcatatcggcgatattacaaataagt 
gcactgatgatgtgggggtttttagtaatcattagcttaattatgggacaaattgtcatt 
atgcttatggcttccttaacgatacttgcat tact tgttagttctattgtcacactt tea 
50 gtgcacccatctacttcagataaaataaattaa 

Sequence 706 

MRKI IMKIRLTFI ILAILSTIGLVLVLAKYPTGPHTINYNEPYTVLIAITTIVIMALPAL 
ILGIFNHLACRIISAILQISALMMWGFLVIISLIMGQIVIMLMASLTILALLVSSIVTLS 
55 VHPSTSDKIN* 

Sequence 707 

Contig_0507_pos_3386_2754 , 

is similar to (with p-value 2.0e-17) 
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>gp:gp| U93874 | BSU93874_16 Bacillus subtilis cysteine synthas 
e (yrhA), cystathionine gamma-lyase (yrhB) , YrhC (yrhC), Yrh 
D (yrhD), formate dehydrogenase chain A (yrhE), YrhF (yrhF), 
formate dehydrogenase (yrhG) , YrhH (yrhH), regulatory prote 
5 in (yrhl), cytochrome P450 102 (yrhJ), YrhK {yrhK), hypothet 
ical protein YrhL (yrhL), putative anti-SigV factor (yrhM) , 
RNA polymerase sigma factor SigV (sigV) and YrhO (yrhO) gene 
s, complete cds, and YrhP (yrhP) gene, partial cds . NID: gl9 
34604. 

10 atggacggtt t aattacattta teat t atcacat tatt gat tat tatagtacctgggcca 
gatttcattattgtaatgaaaaacactattaattcaagtaaaatgaatggttttatggct 
gcatttggtattactacggggcatattttatactcttcattagctatttttggaattata 
tacatacttacgagtttacactttgtttttttaacaataaaaatattgggtgcttgttat 
cttatttatctcggaatcaaaagtattttgagtgcgcacagttctgttgattttagtaaa 

15 caagctttagctgatgtcagaaatgtgagttatatcacttcttttagacaaggtttttta 
agtacaagtcttaaccctaaggctttattattttatgttagtatattcccgcagttcctt 
tctaacggtaatatacatatgaaatctgaagttgcgttatttgctttttcagttgttgta 
gttatatgcttatggtttttattttgtgtattcatctttcaatatattaaattattattc 
agcagaccgagattcaaggctatatttgattatattgtagggtttgttttaattggctta 

20 tctat taatt tat tatt aagtaaaagtagctaa 

Sequence 708 

MDGLITFI IITLLIIIVPGPDFIIVMKNTINSSKMNGFMAAFGITTGH1LYSSLAIFGI I 
YILTSLHFVFLTIKILGACYLI YLGIKSILSAHSSVDFSKQALADVRNVSYITSFRQGFL 
25 STSLNPKALLFYVSIFPQFLSNGNIHMKSEVALFAFSVVVVICLWFLFCVFIFQYIKLLF 
SRPRFKAI FDYIVGFVLIGLSINLLLSKSS* 

Sequence 709 

Contig_0508_pos_3124_1019, 

30 putative peptide of unknown function 

atgacatgttttttaaatatatataaagaatatcattctccccctgaacgcaatattaat 
agaattattctgatgttttcacttacaaatgacttaaaaatcactattaatggtgaaaca 
aaagatttaggtaatcatatagctatcattaatcaatctgacatctattttattaatagt 
get t caaa tctcg tat tactct eta ttccagt tatt tat ttttatagtaaagataataaa 

35 ttttttaaatgttattttgacagacatttattacaatcaagcagttttgttaaaacaatt 
attttacaagctattcaacatttaataaaaggagaaaatcaagatgagcaatccatctct 
aaaataatacaaacgctactaaaagaagcagtcattcgatataagaaaaaatatattcct 
caaattgcagttaatcattcagtgtttactgaaggattaacatttattcattcaaaagta 
tcacagtcactttcactacgagaagtagctcaacattgtaatatatctgaatcttattgt 

40 tctaacctatttgcaagatatcttaatatgaattttaaagattattttacaagcttaaaa 
gtgattgactctataaaaaggctactttcatctgaggactcaattaacgctatttcagaa 
caatctggat ttagtag teat accaattttacaaat caatttaaaaat tatt taggttgt 
agcccaaaacaataccgaacgattatctctaagttagactccttaccttcgataagtttt 
agtgatactgacttttcacaatatattgatttaattaatcaatttgagtttagtgatcat 

45 ttggctactgaaacgactgaaagagatatcaatgaattttatcctcaagatcagactaaa 
aactctaaagcgtttatacgttttcaaaatttcaacgaattatttcaatttgtttttaat 
gaatattacaacattgattttacctccctgccacaagctgtaatttttatcaatgatatc 
actgatatttcgacgcgagaggtgaactttaatttattaaatcgatgctttgaaaaatta 
ttcgaaaaaaatataggtttagccatgagattaacatctacaaatgaatttgaatctatc 

50 aaagaaataattttattatttcttaatagccaccaagattataaaatgaacaaaaaaatg 
gttaaatttatgttagtctttgaaactaagaatatgtcagtgaacgatatacatttatgc 
- catttaaaaattaaaaataaaaataaagctatccgttatagtataactgttgaaggatta 
ctacaccaaaattcttctattgatcgaacttatgatatgatgaagcgattaaatttcgat 
tactattttatagacattgaaaatctagaaaccaaaaactcattaattaccaaacgtaaa 

55 tcgtacttacattcgtccacacactttgaaaattataaacaatttatattagattccggt 
ataccttcaactaaatttgtttacaataatttatctttaaagtgttttaaatatacaaac 
aatggtacttatccacttcaattatctgaccttgtttgtcatttagtcgcattaatgcgt 
tacgggggtggtgtaagttatcaactgatagaagatgagagtccttttattgccttattt 
aatcgttatggtagtcccctacctctcatgcacctctataaattaatcgaaccattttta 
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aatgaacctttagagatagctaacaattttttaatgagtcgcaaagatggtaactatcac 
tttttattatttaataaaataaatgatcgttatctatctgatagtcaacagcgctacgtt 
tttaaaaatacattatcaaccaactcattaattattattaaaacgttaaatcatgagcat 
ggcgcaattcaaaaccttctaccacaaactaaacaacaattttatattgaacgtagtatt 
5 cttgatgaacttgataaatcaaatcaaccaaaaacagaattagctatacaacatgaccat 
catcttcctttccaagtcaccttaaaacacgatgaagtcaaatatatttgttttaaacct 
tcttaa 



Sequence 710 

10 MTCFLNIYKEYHSPPERNINRI ILMFSLTNDLKITINGETKDLGNHIAIINQSDIYFINS 
ASNLVLLS IPVI Y FYSKDNKFFKCYFDRHLLQSSS FVKT 1 1 LQAIQHLI KGENQDEQS I S 
. KIIQTLLKEAVIRYKKKYIPQIAVNHSVFTEGLTFIHSKVSQSLSLREVAQHCNISESYC 
SNLFARYLNMNFKDYFTSLKVIDSIKRLLSSEDSINAISEQSGFSSHTNFTNQFKNYLGC 
SPKQYRTIISKLDSLPSISFSDTDFSQYIDLINQFEFSDHLATETTERDINEFY PQDQTK 

15 NSKAFIRFQNFNELFQFVFNEYYNIDFTSLPQAVIFINDITDISTREVWNLLNRCFEKL 
FEKNIGLAMRLTSTNEFESIKEIILLFLNSHQDYKMNKKMVKFMLVFETKNMSVNDIHLC 
HLKIKNKNKAIRYSITVEGLLHQNSSI DRTYDMMKRLNFDYYFIDIENLETKNSLITKRK 
SYLHSSTHFENYKQFILDSGIPSTKFVYNNLSLKCFKYTNNGTYPLQLSDLVCHLVALMR 
YGGGVSYQLIEDESPFIALFNRYGSPLPLMHLYKLIEPFLNEPLEIANNFLMSRKDGNYH 

20 FLLFNKINDRYLS DSQQRYVFKNTLSTNSLI 1 1 KTLNHEHGAIQNLLPQTKQQFY I ERS I 
LDELDKSNQPKTELAIQHDHHLPFQVTLKHDEVKYICFKPS* 

Sequence 711 

Contig_0508_pos_988_524, 
25 is similar to (with p-value 2.0e-25) 

>sp:sp|P80238|GS26_BACSU GENERAL STRESS PROTEIN 26. >gp:gp|A 

B001488 I AB001488J7 Bacillus subtilis genome sequence, 148 kb 
sequence of the region between 35 and 47 degree. NID: gl881 

226. >gp:gp| Z99106 I BSUB0003_69 Bacillus subtilis complete ge 
30 nome (section 3 of 21): from 402751 to 611850. NID: g2632653 

atgtgtgataaggatatatacaataataaggagatgataattttgaataaacaacaagtg 
acaaaagcaatagaaaaagtattaaattcttcaaaaattggtgtcctatcaacagcacat 
cataataaacctaatagcagatatatggtcttttacaacgatgacttaaacttatataca 
35 aaaacgaatatcaattcactaaaagtcgaagaaatagaaaataatcctgatgctcatatt 
ttattaggctataacgaaacaacaaacaatagctttgttgaaatagatgccactatagaa 
gttgtcaaaaatcaaaaagttattgattggttatgggaaactcaagacaaaacatttttc 
aattcaaaagaagatcctgaattatgtgtactcaaagttatacctcgttcaattaaatta 
atgaatgatgatgaactagatacgccagctacaattgagttataa 

40 

Sequence 712 

MCDKDI YNNKEMIILNKQQVTKAIEKVLNSSKIGVLSTAHHNKPNSRYMVFYNDDLNLYT 
KTNINSLKVEEIENNPDAHILLGYNETTNNSFVEIDATIEVVKNQKVIDWLWETQDKTFF 
NSKEDPELCVLKVI PRSIKLMNDDELDTPATIEL* 

45 

Sequence 713 

Con t i g_0 5 0 9_po s_3 0 8_7 0 6 , 

putative peptide of unknown function 

atgatagaagaattgattaaccgtgaaaaaatgaattttggtgtaatcaatatactttta 
50 cagtttgttatgttaaaagaagatatgaagttgccaaaatcttatatttttgaaattgct 
tccaactggaagaaaattggtatttcaaatgccaaacaagcatatgaatatgcattacaa 
gttaatcaacctaaaaattacgaaacacattctaatgataaacgacagaacaatcgtgga 
agacaaaatcaatttttatccaaagaaaagacacctaaatggcttcaaaatagggacgat 
caagaagaaaataaagaaataaatgatgacactctcgaagaagatcgacaagcatttctt 
55 gaaaagttaaatcaaaagtggaaggaggaagataactaa 

Sequence 714 

MIEELINREKMNFGVINILLQFVMLKEDMKLPKSYIFEIASNWKKIGISNAKQAYEYALQ 
VNQPKNYETHSNDKRQNNRGRQNQFLSKEKTPKWLQNRDDQEENKEINDDTLEEDRQAFL 
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EKLNQKWKEEDN* 
Sequence 715 

Cont ig_0 5 0 9_pos_7 2 7_1 1 9 4 , 
5 putative peptide of unknown function 

atgggcgattctcaaaatctagataaacgtatacaaaaaataaaacaaaatgtaatcaat 
gatactgacgttaaacattttcttgagaaaaatcgtagtaatataactaatgagatgata 
gacgaagatttaaatgttcttcaagagtataaagatcaacaaaaagtttatgatggacat 
cgctatgatgattgtccgaattttgtaaaaggacatgttcctgaactatatattgaaaat 
10 gaaagaatcaaaattagatatctaccttgcccgtgtaaaattaaacatgatgaggaacga 
tttgattcacaacttattacatctcaccatatgcaaagagatacacttcatgcaaagctc 
aaagatatttatatgaataatcgagagagacttgatgtagcaatggcagctgatcaaatc 
tgtacagcaattactcaaattaaaattagaaagttttatacgtcctaa 

15 Sequence 716 

MGDSQNLDKRIQKIKQNVINDTDVKH FLEKNRSNITNEMIDEDLNVLQEYKDQQKVYDGH 
RYDDCPNFVKGHVPELYIENERIKIRYLPC PCKIKHDEERFDSQLITSHHMQRDTLHAKL 
KDIYMNNRERLDVAMAADQICTAITQIKIRKFYTS* 

20 Sequence 717 

Contig_0509_pos_1283_2650, 

is similar to (with p-value 3.0e-20) 

>sp : sp | P10564 | HEXA_STRPN DNA MISMATCH REPAIR PROTEIN HEXA. > 
pir:pir|C28667 (C28667 DNA mismatch repair protein hexA - Str 
25 eptococcus pneumoniae >gp : gp | M 1 8 7 2 9 I STRHEXA_3 S. pneumoniae m 
ismatch repair protein (hexA) gene, complete cds . NID: gl536 
54. 

atggatacgttatttcataaaattaattttaatttcactgcaattggtgaaatgcgacta 
tatgcaactttaagaggtatgtttaaggtaaatcaaacctcattgataaacatgtttaaa 

30 gaaaataaagtatttcgtttaaatgtatcttacattctttctaaaattggtaaaaatgta 
taccctttgtttccagatcaaatgttacccactaagcgaaatattttattaatgttttgt 
ccgttgttaccatttatcgggttcgcattcatttttttaattccttcaaaaggtatatta 
atatgtcttacttttatgattttaaatgcaatattatctttcaaactaaaaaaatcttat 
gaccaagatttaaaatcaattttttatactgctaatgttataaagcaaagtcaagcttta 

35 agtaagattgagagcacgcccgcgataagtgttgatt t tactcat tttaaagcttcacgc 
cgttttagtggtttattagctagagtagaatcacaagatatggcgagtagcataatcatg 
tttattaaattagtattcatgatagattatgttttatttcatttaatacaacgcagctac 
tttaagtatcaagaagaagttatgacatgttatgactacataagcatattagataatcat 
tactctatagctatgtatcaacatactttgacacattattgttatcctaaaatcaatcac 

40 aatattaatggtcttcaaatgaaatcaatcattcatcctctactagatgaagaaaatgcg 
attgctaacacaattgacatttcaaatcatatattgctcacaggctctaatgcatcagga 
aaatctacatttatgaaagcagttgcactaaatttgattttagctcaatcgatacaaact 
gcaacagctcactcatttatttatcaacctggctatgtaatgacatcaatggcaaatgcg 
gatgacgttttaagtggtgacagttatttcatgtcagaacttaagtctattcgtagatta 

45 tttaacactcatcagtgcaataagatatattgttttatagatgaaatttttaaaagaacg 
aatacaactgaacgtattgcggcttctgaatcagtattatcgtatttagataatcaaaaa 
gcatatcaggttatcgctgcgacacatgatgttgaattatcaacattattagaaaataca 
tataataattatcattttaatgaatcaattcaagaaaatagcatatttttcgattacaaa 
attaaaccaggtaaagccaatacacgtaatgcaattgaattactacgcattacgcagttt 

50 cctatcgatatttatcagcgtgctcaacaaaatattcgaaacctctag 

Sequence 718 

MDTLFHKINFNFTAIGEMRLYATLRGMFKVNQTSLINMFKENKVFRLNVSYILSKIGKNV 
YPLFPDQMLPTKRNILLMFCPLLPFIGFAFI FLI PSKGILICLTFMILNAILSFKLKKSY 
55 DQDLKSIFYTANVIKQSQALSKIESTPAISVDFTHFKASRRFSGLLARVESQDMASSIIM 
FIKLVFMIDYVLFHLIQRSYFKYQEEVMTCYDYISILDNHYSIAMYQHTLTHYCYPKINH 
NINGLQMKSIIHPLLDEENAIANTIDISNHILLTGSNASGKSTFMKAVALNLILAQSIQT 
ATAHSFIYQPGYVMTSMANADDVLSGDSYFMSELKSIRRLFNTHQCNKI YCFI DEIFKRT 
NTTERIAASESVLSYLDNQKAYQVIAATHDVELSTLLENTYNNYHFNESIQENSIFFDYK 
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IKPGKANTRNAIELLRITQFPIDIYQRAQQNIRNL* 
Sequence 719 

Contig_0509_pos_9141_8665, 
5 putative peptide of unknown function 

atgctacgttattctctacgcacagcatcgcacacagtcaaattagcagaagaattaagc 
tacattgagcagtatgttgccatacaaaatatccgcttcgatgatatgatacagctttac 
atcgatgctactgagggtgtacaacatcaaacaattggtaagatgatgcttcaaccactc 
gtagaaaatgccatcaagcatggtcgtgatagtgaacctttaaagataacaattcgtatc 
10 agacttacgaagcgcaaattacatattctggttcatgataatggcatcggtatgtctcca 
tcacatttagaacacgtgcgccaatcccttcatcacgatgtttttgatacgacacaccta 
ggtttaaatcatttacataatagagccatgattcaatatggaacatatgcacgtctgcac 
attttctcaagaagccagcaagggacattaatgtgttaccaaataccacttgtctag 

15 Sequence 720 

MLRYSLRTASHTVKLAEELSYIEQYVAIQNIRFDDMIQLYIDATEGVQHQTIGKMMLQPL 
VENAIKHGRDSEPLKITIRIRLTKRKLHILVHDNGIGMSPSHLEHVRQSLHHDVFDTTHL 
GLNHLHNRAMIQYGTYARLHIFSRSQQGTLMCYQIPLV* 

20 Sequence 721 

Cont ig_0 5 0 9_pos_8 6 5 3_7 898, 

putative peptide of unknown function 

atgtttaaagtagttatttgtgatgatgaaaggattataagagaaggcttaaagcaaatg 
gttccatgggaggactatcatttcaccactgtttatactgccaaagacggcgtggaagca 

25 ttgtctttaattcgccaacatcaacctgaactcgtcattactgatatacgaatgcctcga 
aaaaatggtgttgacctactagatgacatcaaagaccttgattgccagattatcatttta 
tcgagttatgacgacttcgaatatatgaaagccggtatacaacatcatgttcttgattat 
ttactaaagccagtagaccacactcagttagagcatattctagacatattagttcaaagg 
ttattagaacgcccacattctaccaatgatgacgcggcatatcatactgcctttcaacca 

30 ttattaaaaattgattacgatgactattatgtcaatcaaattttgtctcaaatcaagcaa 
cattatcacaagaaagtgactgttcttgacttaattaatcctattgatgtaagtgagtca 
tacgccatgaggacgtttaaagaacatgtaggcattacgatagttgattatctaaatcgt 
tatcgtattttaaaatcattacatcttttagaccagcactacaagcattatgaaattgct 
gaaaaagtaggtttttctgagtataaaatgttttgctatcattttaaaaaatatttacat 

35 atgtcaccaagtgattataataagcaatcaaaatag 

Sequence 722 

MFKVVICDDERIIREGLKQMVPWEDYHFTTVYTAKDGVEALSLIRQHQPELVITDIRMPR 
KNGVDLLDDIKDLDCQIIILSSYDDFEYMKAGIQHHVLDYLLKPVDHTQLEHILDILVQR 
40 LLERPHSTNDDAAYHTAFQPLLKIDYDDYYVNQILSQIKQHYHKKVTVLDLINPIDVSES 
YAMRTFKEHVGITIVDYLNRYRILKSLHLLDQHYKHYEIAEKVGFSEYKMFCYHFKKYLH 
MSPSDYNKQSK* 

Sequence 723 
45 Contig_0509j?os_6213_5707, 

is similar to (with p-value 5.0e-25) 

>sp:sp|P43984|Y318_HAEIN HYPOTHETICAL PROTEIN HI0318. >pir:p 
ir IB64006IB64006 hypothetical protein HI0318 - Haemophilus i 
nfluenzae (strain Rd KW20) >gp:gp|U32717 |U32717_5 Haemophilu 
50 s influenzae Rd section 32 of 163 of the complete genome. NI 
D: gl573283. 

atggaggacatcatgattttaactattttatttatctttttctgtattcgactcatcagc 
ttaaagatatctatgcaacacgcaaaacagctaaaggtagagggcgcggtggaatatggt 
gtgaaaaattcaaaatatctagccattacgcatgtattaatttacatgagtgcagctata 
55 gaagcattcattcgtaaggatacatttagtctacttaacggcattggcttaatcatattg 
atcatcgcttatatcatgctatttatagttattaagacattaggtcgtatttggacattg 
aaattatttatactgcccaatcaccctattattaagtcagggttatataaagtaacgaaa 
catccaaactattttttaaatatcattcccgaattaattggtgtattactactaacaaat 
gctacatacacaacactcttattagttccatatgcttattttttaattgtacgtatccgt 
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caagaagagaaattaatgaatatataa 
Sequence 724 

MEDIMILTILFIFFCIRLISLKISMQHAKQLKVEGAVEYGVKNSKYLAITHVLIYMSMI 
5 EAFIRKDTFSLLNGIGLIILI IAYIMLFIVIKTLGRIWTLKLFILPNHPIIKSGLYKVTK 
HPNYFLNI IPELIGVLLLTNATYTTLLLVPYAYFLIVRIRQEEKLMNI* 

Sequence 725 

Contig_0509_pos_5264_4 4 4 9, 

10 is similar to (with p-value 2.0e-68) 

>gp: gpj U30714 | BAU30714_2 Bacillus anthracis Weybridge A toxi 
n plasmid pXOl right inverted repeat element (WeyAR) borderi 
ng the toxin-encoding region, ORFA and ORFB genes, complete 
cds. NID : g929970. >gp: gp | U30715 I BAU30715_2 Bacillus anthrac 

15 is Sterne toxin plasmid pXOl left inverted repeat element (S 
terneL) bordering the toxin-encoding region, ORFB and trunca 
ted ORFA genes, complete cds. NID: g929973. 

gtggatcaattaaaagtaaaatattcaatcaaattgatactagaagtattaaacatacct 
aaatcaacatattaccgatggaaaaacaaaacctataaaaatgataccgtaacacaaaaa 

20 gtcattgaattatgtaaagctaaccactatacctacggttatcgtaagattacagcattg 
attaatcaatgttatacatcaccaattaatcataagagagtacagagaatgatgcagaag 
catcatttgaactgccgagttagacctaaaaagacgacaagaataggtaaaccgtattat 
aaaacggacaatttattacaaagacaatttaaagcgagttgtcccatggaagtattaaca 
accgatattacttatttaccatttggtcattctatgttgtatttatcttcgataatggat 

25 atttataacggagaaattgtggcgtataaaatagatgataaacaagaccaaagtttagtt 
aatgatacattaaatcaaatcgatatacctgaaggttgtatattacatagtgatcaaggc 
agcgtttatacatcttatgcttattatcaattgtgcgaagaaaaaggcattatcagaagt 
atgtcccgaaagggaacacctgccgataacgccccgatagaaagtttccattcctcgcta 
aagtctgaaactttttacatcaataatgagcttaatcgctctaatcatattgtaatagat 

30 attgtcgaaaagtacattaaaaactataataataatcgaattcaacaaaaactaggctac 
ttatccccagtaaaatacagagaattaatagcctag 

Sequence 726 

VDQLKVKYSIKLILEVLNIPKSTYYRWKNKTYKNDTVTQKVIELCE<AMHYTYGYRKITAL 
35 INQCYTSPINHKRVQRMMQKHHLNCRVRPKKTTRIGKPYYKTDNLLQRQFKASCPMEVLT 
TDITYLPFGHSMLYLSSIMDIYNGEIVAYKIDDKQDQSLVNDTLNQIDIPEGCILHSDQG 
SVYTSYAYYQLCEEKGIIRSMSRKGTPADNAPIESFHSSLKSETFYINNELNRSNHIVID 
IVEKYIKNYNNNRIQQKLGYLSPVKYRELIA* 

40 Sequence 727 

Contig_0509_pos_3554_2733, 

is similar to (with p-value 2.0e-47) 

>gp: gp | AL031317 | SC6G4_30 Streptomyces coelicolor cosmid 6G4 . 
NID: g3449234. 

45 gtgagtgcaaaagtgaaaatggaagacattgacgctattgcagtaacacaaggcccagga 
ttaataggagctttattgattggtattaatgcggctaaagctttggcatttgcttatgat 
aagcctattattccagtacatcatattgctggtcatatttatgccaatcacttagaacaa 
ccattaacgtttccactaatgtcattgattgtatctggtggtcatactgaactagtatat 
atgaaaaatcatttagatttcgaagtgattggtgaaacgagagatgatgcagtaggagaa 

50 gcttatgataaagttgctcgaacaatcaatcttccttatcctggtggaccgcatattgat 
cgattagcagctaaaggtaaagatgtatatgattttccaagagtttggctcgaaaaagat 
agttatgattttagttttagtggtcttaaaagtgctgtaataaataaactgcataattta 
agacagaaaaatattgaaattgtagctgaagatgttgcaacgagtttccaaaatagtgtt 
gtagaagttttaacctataaagctattcatgcttgtaaaacttataatgttaatcgctta 

55 attgttgcaggtggtgttgctagtaataaaggattaagaaatgcactaagtgaagcatgt 
aaaaaagagggtatacaccttactattccaagtcctgttctttgcactgataatgcagcg 
atgattggtgctgctggatattatttatatcaagctggtttgcgtggcgatttagcttta 
aatggacaaaataatattgatattgaaactttttctgtttaa 
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Sequence 728 

VSAKVKMEDIDAIAVTQGPGLIGALLIGINAAKALAFAYDKPI IPVHHIAGHIYANHLEQ 
PLTFPLMSLIVSGGHTELVYMKNHLDFEVIGETRDDAVGEAYDKVARTINLPYPGGPHID 
RLAAKGKDVYDFPRVWLEKDSYDFSFSGLKSAVINKLHNLRQKNIEIVAEDVATSFQNSV 
5 VEVLTYKAIHACKTYNVNRLIVAGGVASNKGLRNALSEACKKEGIHLTIPSPVLCTDNAA 
MIGAAGYYLYQAGLRGDLALNGQNNIDIETFSV* 

Sequence 729 
Con t ig_0 5 1 0_po s_3 1 5_65 0 , 
10 is similar to (with p-value 4.0e-19) 

>gp:gp|L4 294 5|STALYTS_l Staphylococcus aureus lytS and lytR 
genes, complete cds . NID: gl854576. 

gtggaaaattctattaaacatgcatttaaaaatcgtaaaaagaataatcatattgatgtg 
gatgttagcatgaagcaagactacttaagtatatctgttcaagataatggtcaaggcata 
15 ccagctgatcaattagatactattggatatacgacagtaacgtctaccactggtactggt 
aatgccttagtcaatcttaataaaagacttactggactatttggaacaacatcggcactg 
aacattcaatcttctcaatcaggcacgactgtaagttgtttaattccatataaatcttct 
aaggaggaacactttaatgaaagcgttaatcgttga 

20 Sequence 730 

VENSI KHAFKNRKKNNHI DVDVSMKQDYLSI SVQDNGQGI PADQLDTIGYTTVTSTTGTG 
NALVNLNKRLTGLFGTTSALNIQSSQSGTTVSCLIPYKSSKEEHFNESVNR* 

Sequence 731 
25 Cont ig_0 5 1 0_po s_7 9 9_1 3 8 9 , 

is similar to (with p-value 2.0e-53) 

>gp:gp|L42945|STALYTS_2 Staphylococcus aureus lytS and lytR 
genes, complete cds. NID: gl854576. 

atggatgaaagtggtattgatttagctcaaaaaattaataaaatgaagcgatcaccacat 
30 attatctttgcaaccgctcacgagaaatttgcagtcaaagcctttgaattaaatgcaacc 
gattatatattaaaaccttttgaaaaagaacgtattaatcaagctgtaaataaggttgac 
atggctaaagataaatcaaaaaacaaagataaaactatcacacctaaatatattgattat 
agtgatgatgagcgcgctcaaacacatgtactcccaattgaagtggatgaacgtattcac 
atcttaaatttcacagacattatcgcattatctgttaataatgggattacaacgatagat 
35 acaacaaaacaaagttatgaaacgaccgaaacacttaatcattacgagaaaaaactacct 
tcctctctatttattaaaatacatcgcgctactatcgttaataaagaacatatccaaaca 
atagagcattggtttaattatacgtatcagctgacgttaacacatgaatttaaatatcaa 
gttagtcgttcttatatgaagacttttaaacaacaacttggtcttcaataa 

40 Sequence 732 

MDESG I DLAQKINKMKRS PHI I FATAH EKFAVKAFELN AT DY I LKPFEKERI NQAVNKVD 
MAKDKSKNKDKTITPKYIDYSDDERAQTHVLPIEVDERIHILNFTDI IALSVNNGITTID 
TTKQSYETTETLNHYEKKLPSSLFIKIHRATIVNKEHIQTIEHWFNYTYQLTLTHEFKYQ 
VSRSYMKTFKQQLGLQ* 

45 

Sequence 733 

Contig_0510_pos_1584_1982, 

is similar to (with p-value 3.0e-28) 

>gp:gp|U52961|SAU52961_l Staphylococcus aureus holin-like pr 
50 otein LrgA (lrgA) and LrgB (lrgB) genes, complete cds. NID: 
gl841516. 

atgaaatacagtatttttcaacaagcattaacgattgcagtgattttacttatatcaaaa 
attattgaatcatttatgcctattccaatgccagcttcagtaattggacttgtactatta 
tttatcgcattgtgtacaggcattgtgaaattaggtcaagttgagactgtgggaactgca 
55 ttaaccaataatattggattcctattcgtaccagccggtatttcagtcattaactcttta 
ccaatccttaagcaaagccctattttaattattttacttattattatttcaacactttta 
ttattaatttgtactggctttgcgtcacaattattagtgacgaaatcacttttcccttct 
aaagagaaaaatgaagaaacaagtcacataggagggtaa 
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Sequence 734 

MKYSIFQQALTIAVILLISKIIESFMPIPMPASVIGLVLLFIALCTGIVKLGQVETVGTA 
LTNNIGFLFVPAGISVINSLPILKQSPILIILLIIISTLLLLICTGFASQLLVTKSLFPS 
KEKNEETSHIGG* 

5 

Sequence 735 

Cont ig_05 1 0_pos_l 9 8 6_2 687, 

is similar to (with p-value 2.0e-80) 

>gp:gp| U52961 I SAU52961_2 Staphylococcus aureus holin-like pr 
10 otein LrgA (lrgA) and LrgB (lrgB) genes, complete cds , NID: 
gl841516. 

atgattgaacatttaggaattaatacaccttattttgggatattagtatcattaatacca 
tttgtcatagcgacttatttttataaaaaaacgaatggtttctttttactagcaccttta 
ttcgtaagtatggttgcaggtattgcttttttgaaattgacaggaattagttatgagaat 

15 tataaaatcggtggcgacattattaatttctttctagaaccagctacaatatgctttgcg 
attcctttatatcgcaagcgcgaagtattaaaaaggtattggttacaaatatttggtggt 
atagctgttggtacaattattgccttgttattaatttatcttgttgcaataacattccaa 
tttggcaatcaaattatagcatctatgctacctcaagctgcaacgacagcaattgcatta 
cctgtatctgacggtatcggtggtgtcaaagaattaacctcactcgcagttatt ttaaat 

20 gcagttg teat ttct get ttaggtgctaaaatagttaaat tat ttaaaatatctaaccct 
attgccagaggacttgcactagggacaagtggacacactttaggtgtcgcggcagctaaa 
gaattgggtgagactgaagaatcaatgggaagtattgcagttgtcatcgttggcgttatt 
gttgtagcagtagttcctatccttgctccaatcttattataa 

25 Sequence 736 

MIEHLGINTPYFGILVSLIPFVIATYFYKKTNGFFLLAPLFVSMVAGIAFLKLTGISYEN 
YKIGGDI INFFLEPATICFAI PLYRKREVLKRYWLQI FGGIAVGTI IALLLI YLVAITFQ 
FGNQIIASMLPQAATTAIALPVSDGIGGVKELTSLAVILNAVVISALGAKIVKLFKISNP 
IARGLALGTSGHTLGVAAAKELGETEESMGSIAVVIVGVI VVAVVPILAPILL* 

30 

Sequence 737 

Contig_0510_pos_3536_3931, 

is similar to (with p-value 1.0e-34) 

>gp:gpjAF009352|AF009352_4 Bacillus subtilis osmoprotectant 
35 transport system OpuC including ATPase (opuCA) , transmembran 
e protein (opuCB), osmoprotectant binding protein precursor 
(opuCC) and transmembrane protein (opuCD) genes, complete cd 
s. NID: g2271388. 

atgattgaatgtaacaaattacctttcattacttatgaccacctttcttttcttcaaagt 
40 aatgatgtttctttaaatattcttcagctatcactgcaggttccttaccttttccatccg 
cttcataatttaacttctgcatttcttctgttgagattttaccttctaatttttttagtg 
ccttatcgatttctggattatcctttattaattgttcatttgcaagtggactaccgtcat 
aaggcgggaagaatttgcgatcatcttccaatattttcaaatcataagctgcaatacgtc 
catctgttgaatacccaactgctacatctaatttattattttttaatgcatcatatacta 
45 aaccaatttgcattggacgtgcactatcaaatttaa 

Sequence 738 

MIECNKLPFITYDHLSFLQSNDVSLNILQLSLQVPYLFHPLHNLTSAFLLLRFYLLIFLV 
PYRFLDYPLLI VHLQVDYRHKAGRICDHLPIFSNHKLQYVHLLNTQLLHLIYYFLMHHIL 
50 NQFALDVHYQI* 

Sequence 739 
Cont ig_0 5 1 0_pos_69 0 9_0 , 
is similar to (with p-value 7.0e-53) 
55 >gp:gp|L3534 3| PSEAC0X_5 Pseudomonas putida TPP-dependent ace 
toin dehydrogenase alpha and beta-subunits (acoA and acoB) , 
dihydrolipoamide acetylt rans f erase (acoC) , g2, 3-butanediol d 
ehydrogenase (adh) and acoX genes, complete cds . NID: g52955 
9. • 
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atgaaagcagcagtatggtatggacaaaaggatgtacgcgttgaagatcgcgaacccaaa 
gcaataaaagacaatgaagtgcaagttaaagtctcttgggccggtatctgtgg tact gat 
ttacatgaatatttggaaggacctatctttatttcaactgatcaaccggacccactactt 
ggtcaaactgcacctgtgactttaggtcatgaattttcaggtgtcatagaaaatgttggt 
5 aaagacgtatcacgttttaaaaaaggggatcgtgtggtagttaatccaacagtgtctaaa 
agagaaaagccggaaaatgttgacttgtatgatggt tat teat ttataggactaggttct 
gatggtgcatttgccgagtttactaatgctcctgaaacaaatgtttatcatctaccagat 
aatgtttcagcacgagaaggtgctcttgtagaaccaacagccgttgctgtccaagcagtt 
aaagaaggcgaattattattcggtgatactgtagcagtatttggcgctgggccaattggt 

10 ttgttaactattgttgcagcaaaagctgctggtgcaagtaaaatatttgtctttgactta 
tcagaagaacgtttagcgaaagctaaaagtgtcggtgcgactcacgtgtataactcaggt 
aacgtcgatccagtacaaacggtttatgaacacactgacaacggtgtagatgtgtcattt 
gaagttgctggtgtaggtattactttacaacaatctattgaagtaacacgtccgcgtggt 
actgctgtcatcgtatcaatcttcggtcatcccgtagaattcaatccattattacaaatg 

15 aataaaggtgtcaagttaacaactacaattgc 

Sequence 74 0 

MKAAVWYGQKDVRVEDREPKAIKDNEVQVKVSWAGICGTDLHEYLEGPIFISTDQPDPLL 
GQTAPVTLGHEFSGVIENVGKDVSRFKKGDRVVVNPTVSKREKPENVDLYDGYSFIGLGS 
20 DGAFAEFTNAPETNVYHLPDNVSAREGALVEPTAVAVQAVKEGELLFGDTVAVFGAGPIG 
LLTIVAAKAAGASKIFVFDLSEERLAKAKSVGATHVYNSGNVDPVQTVYEHTDNGVDVSF 
EVAGVGITLQQSIEVTRPRGTAVIVSIFGHPVEFNPLLQMNKGVKLTTTIA 

Sequence 741 

Contig_0510__pos_6273_5167, 
is similar to (with p-value 2.0e-94) 

>gp:gp| AF009352 I AF009352_2 Bacillus subtilis osmoprotectant 
transport system OpuC including ATPase (opuCA) , transmembran 
e protein (opuCB) , osmoprotectant binding protein precursor 
(opuCC) and transmembrane protein (opuCD) genes, complete cd 
s. NID: g2271388. >gp : gp I Z99121 | BSUB001869 Bacillus subtili 
s complete genome (section 18 of 21): from 3399551 to 360906 
0. NID: g2635827. 

atgattgaggcgacagatggacagattatgatgaatggaaaagatgtccgtaatatgaat 
cctgttgaattgcggagaagtatcggttatgtcattcaacaaattggtttgatgccacat 
atgactattcgagaaaatattgttttagtacctaaacttttaaaatggtctaaagagaag 
aaagatgaaaaagctaaagaacttattaaactggtagatttacctgaagaatatttggat 
cgttatccagctgaattgtcaggagggcaacaacaacgaattggtgttgtgcgcgcttta 
gcagctgaacaagatattatattaatggatgaacctttcggtgcattagatcctattaca 
cgcgatacattacaagatttagtaaaggaattacaacaaaaattaggaaaaacatttatt 
tttgtcactcatgatatggatgaggctattaaattagcagacaaaatatgtattatgtct 
aagggaaaagtcgttcaatacgatacacctgacaatattttacgatatcctgcaaatgac 
tttgttagagattttattgggcaaaatcgcttgattcaggatcgtcctaatatgaaatct 
gtggaaagtgctatgatcaaacccgtcactgttaaagcagatgattcattgaatgatgca 
gtaaatattatgagaacacgtcgagtagacactatttttgtagtcaataatcaaaataaa 
ttattaggatttttagatattgaagatatcaatcaaggattacgtgcgcgtaaagaatta 
attgataccatgcaaagggatgtctacaaagtacatatcaattcaaagttacaagactca 
gtgcgtactattctaaaacgtaatgttagaaatgtccctgtggtcgataatgatgaacat 
ctcattggtttaattacacgtgcaaacttagtcgatattgtgtatgactcaatttggggc 
gaagaagattctgatagttatgagatcccaaatgaaagcttagatgagaataatcacgat 
ttaccacaaaatcaaactgatacacgaacaaatataaatgaagatgtgaatgattatcat 
gatgctcaacatagaggtgaggattaa 

Sequence 7 42 

55 MIEATDGQIMMNGKDVRNMNPVELRRSIGYVIQQIGLMPHMTIRENIVLVPKLLKWSKEK 
KDEKAKELIKLVDLPEEYLDRYPAELSGGQQQRIGVVRALAAEQDIILMDEPFGALDPIT 
RDTLQDLVKELQQKLGKTFIFVTHDMDEAIKLADKICIMSKGKVVQYDTPDNILRYPAND 
FVRDFIGQNRLIQDRPNMKSVESAMIKPVTVKADDSLNDAVNIMRTRRVDTIFWNNQNK 
LLGFLDIEDINQGLRARKELIDTMQRDVYKVHINSKLQDSVRTILKRNVRNVPWDNDEH 
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LIGLITRANLVDIVYDSIWGEEDSDSYEIPNESLDENNHDLPQNQTDTRTNINEDVNDYH 
DAQHRGED* 

Sequence 7 43 
5 Con t i g_0 5 1 0_pos_5 1 6 4 _4 5 2 9 , 

is similar to (with p-value 5.0e-55) 

>gp:gp|AF009352|AF009352_3 Bacillus subtilis osmoprotectant 
transport system OpuC including ATPase (opuCA) , transmembran 
e protein (opuCB) , osmoprotectant binding protein precursor 
10 (opuCC) and transmembrane protein (opuCD) genes, complete cd 
s. NID: g2271388. >gp : gp | Z99121 I BSUB0018_68 Bacillus subtili 
s complete genome (section 18 of 21): from 3399551 to 360906 
0. NID: g2635827. 

atgaaagcattcttacaagaatatggtagtcaacttttatcaaaagcagtagaacatttt 
15 tatatttctatgtttgcattattgttagcgattgtagtagctgtccctttaggtatttta 
ttatcaaaaacgcaacgcacagctaatgtggtattaacagttgctggcgtgcttcaaacc 
attcctactttggctgtgctagctatcatgattccaatatttggggtaggaaaaacacca 
gctattgttgcattatttatctatgtattattaccaattttaaataatacagtattaggt 
gttaaaaatatcgataaaaatgtcattcaagctggtcaaagtatgggaatgactaaattt 
20 caattaatgaaagatgtagaaatgcctttagctttaccacttattattagtggtattcgt 
ctatcaagtgtatacgtcattagttgggcaacactcgcaagttatgtaggtgcaggtgga 
cttggggatcttgtatttaatggattaaatctctatcaaccacctatgattattagtgca 
gcgattgttgttactttattagcattagttattgactttatact ttcattagttgaaaaa 
tgggttgtacctaaaggattaaaagtatctagataa 

25 

Sequence 74 4 

MKAFLQEYGSQLLSKAVEHFYISMFALLLAIVVAVPLGILLSKTQRTANVVLTVAGVLQT 
IPTLAVLAIMIPIFGVGKTPAI VALFI YVLLPILNNTVLGVKNIDPCNVIQAGQSMGMTKF 
QLMKDVEMPLALPLI ISGIRLSSVYVISWATLASYVGAGGLGDLVFNGLNLYQPPMIISA 
30 AI VVTLLALVI DFILSLVEKWVVPKGLKVSR* 

Sequence 74 5 

Contig_0510_pos_4 4 61_3568, 

is similar to (with p-value 7.0e-85) 

35 >gp:gp| AF009352 |AF009352_4 Bacillus subtilis osmoprotectant 
transport system OpuC including ATPase (opuCA), transmembran 
e protein (opuCB) , osmoprotectant binding protein precursor 
(opuCC) and transmembrane protein (opuCD) genes, complete cd 
s. NID: g2271388. 

40 gtgttatctggatgcagtttaccaggtttaggtgatggaaatgcaaaagatgatgtgaaa 
atcacaacgactgaaacaagtgaaactaagattataggtcatatggaaaaattattaatt 
gaacatgaaactgatggaaaaatcaaaccgacgttgattgggaacctaggttctagcatt 
attcaacataatgcgttacaacgtggtgatgcaaatatgtcagcggtacgttacacaggt 
actgaattgacgagtgtattagcagctaaacctactaaagatcctgataaggccatgtct 

45 gaaacacaacgcttatttaaaaagaaatatgatgaaaagtattatcattcacttgggttt 
gcgaatacatacgcattcatggtgacaaaagaaacggctaaaaagtatcacttagaaaaa 
gtatcagatttagagaaatataaagatgaactacgtcttggaatggatacccaatggatg 
aaccgtgcaggtgatggatatccagcttttgttaaagattatggatttaaatttgatagt 
gcacgtccaatgcaaattggtttagtatatgatgcattaaaaaataataaattagatgta 

50 gcagttgggtattcaacagatggacgtattgcagcttatgatttgaaaatattggaagat 
gatcgcaaattcttcccgccttatgacggtagtccacttgcaaatgaacaattaataaag 
gataatccagaaatcgataaggcactaaaaaaattagaaggtaaaatctcaacagaagaa 
atgcagaagttaaattatgaagcggatggaaaaggtaaggaacctgcagtgatagctgaa 
gaatatttaaagaaacatcattactttgaagaaaagaaaggtggtcataagtaa 

55 

Sequence 74 6 

VLSGCSLPGLG DGNAKDDVKITTTETSETKIIGHMEKLLIEHETDGKIKPTLIGNLGSSI 
IQHNALQRGDANMSAVRYTGTELTSVLAAKPTKDPDKAMSETQRLFKKKYDEKYYHSLGF 
ANTYAFMVTKETAKKYHLEKVSDLEKYKDELRLGMDTQWMNRAGDGYPAFVKDYGFKFDS 
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ARPMQIGLVYDALKNNKLDVAVGYSTDGRIAAYDLKILEDDRKFFPPYDGSPLANEQLIK 
DNPEIDKALKKLEGKISTEEMQKLNYEADGKGKEPAVIAEEYLKKHHYFEEKKGGHK* 

Sequence 74 7 
5 Cont ig_0 5 1 0_pos_3 4 8 1_2 873, 

is similar to (with p-value 9.0e-30) 

>gp:gp| AF009352 |AF009352_5 Bacillus subtilis osmoprotectant 
transport system OpuC including ATPase (opuCA) , transmembran 
e protein (opuCB) , osmoprotectant binding protein precursor 
10 (opuCC) and transmembrane protein {opuCD} genes, complete cd 
s. NID: g2271388. >gp : gp | Z99121 | BSUB0018_66 Bacillus subtili 
s complete genome (section 18 of 21) : from 3399551 to 360906 
0. NID: g2635827. 

atgtcggtatatggtgtgttgtttgcatgtataattggaattcctattggtattttcata 
15 gccaagtataaacgtttatcgtggccggtaattacaattgcaaatattatacaaactgtt 
ccagcaatcgctatgttagccatacttatgttggctatgggattaggaccaacaactgtt 
gttgtaactgtattcctatattcgttattacctattattaaaaatacttatactggtatt 
gtagaagttgatgaaaatattaaagacgctggtaaaggtatgggaatgacggggaatcaa 
atattaagaatgatagagttaccattatctttatctgttattattggtggtgttagaatt 
20 gcacttgttgttgctatcggaatagtagcgattgggtcatttatcggtgctccaacacta 
ggtgatattattattcgtggtacaaattcaacagatggaacaacattcatcttagcaggt 
gccataccaattgctttaatagcaattatcatagatataggattacgttatctagaaaaa 
cgtttagatcctactcgtaaaaacaaaaaagattcaatgcaaaaacatcaagtacaaaaa 
ttacgttaa 

25 

Sequence 748 

MSVYGVLFACI IGIPIGIFIAKYKRLSWPVITIANIIQTVPAIAMLAILMLAMGLGPTTV 
WTVFLYSLLPIIKNTYTGIVEVDENIKDAGKGMGMTGNQILRMIELPLSLSVIIGGVRI 
ALVVAIGIVAIGSFIGAPTLGDIIIRGTNSTDGTTFILAGAIPIALIAIIIDIGLRYLEK 
30 RLDPTRKNKKDSMQKHQVQKLR* 

Sequence 74 9 

Cont ig__0 5 1 l_pos_4 57 9_0 , 

is similar to (with p-value 6.0e-48) 

35 >sp:sp|P134 85|TAGF_BACSU TEICHOIC ACID BIOSYNTHESIS PROTEIN 
F. >pir :pir { S0604 9 | S0604 9 rodC protein - Bacillus subtilis > 
gp:gp 1X15200 | BSRODC_2 Bacillus subtilis rodC operon. NID: g4 
0098. >gp:gpl Z99122 | BSUB0019_69 Bacillus subtilis complete g 
enome (section 19 of 21): from 3597091 to 3809700. NID: g263 

40 6029. 

gtggtacgtatgccgggtactacgacaccaaagtataagcgtaattttaatcgtgaaaca 
tcacgttgggattatttaatttcgccaaatagatattcaactgaaatatttagaagtgct 
ttttggatggatgaagaaagaatattagagataggttatccaagaaatgatgtattagtt 
aatagagccaatgatcaagagtatttagatgaaattagaactcacttaaatttacctagt 
45 gataaaaaggttattatgtatgctccgacatggagagacgatgaatttgtgagtaaagga 
aaatatttgtttgaattaaaaattgatttagacaacctttataaagaactcggagatgat 
tatgtgattttattacgcatgcattatctcatttctaacgcacttgatttatctggttat 
gaaaattttgcaattgatgtttcaaactataatgacgtctctgaattatttttaataag 

50 Sequence 750 

VVRiVIPGTTTPKYKRNFNRETSRWDYLISPNRYSTEIFRSAFWMDEERILEIGYPRNDVLV 
NRANDQEYLDEIRTHLNLPSDKKVIMYAPTWRDDEFVSKGKYLFELKI DLDNLYKELGDD 
YVILLRMHYLISNALDLSGYENFAIDVSNYNDVSELFLIX 

55 Sequence 751 

Contig_0511_pos_4 47 1^3650, 

is similar to (with p-value 6.0e-42) 

>sp:sp|P3907 4 |BMRO_BACSU BMRU PROTEIN. >gp : gp I L25604 | BACBMRU 
RBE_1 Bacillus subtilis bmrU, multidrug efflux transporter ( 
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bmr) and its regulator (bmrR) genes, complete cds, and branc 
hed-chain 2-oxo acid dehydrogenase (bfmB) gene, 3* end. NID: 
g2558636. >gp: gp I D84432 | BACJH642_251 Bacillus subtilis DNA, 
283 Kb region containing skin element. NID: g2627063. >gp:g 
5 p| Z99116|BSUB0013_111 Bacillus subtilis complete genome (sec 
tion 13 of 21): from 2395261 to 2613730. NID: g2634723. 
atgtgtaaacacctctctcttcaactcagtgaaaataaaggcgatattattaaatattgt 
aaatctattaaaaatgaaaattatagctctgatgtagacgttttatttattttaggtgga 
gatggtacacttaatgaactagtaaatggcgttatgcagtatcagttaaatttaccaatc 

10 ggtgtaataccaggtggtacctttaacgattttacaaaaacacttcaactgcaccctaat 
tttaaaacagctagtgagcaattattaacatcacatgctgaatcatatgatgtgttaaaa 
gtgaacgacttatatgtacttaatttcgttggacttggcttaatagtacaaaatgcagag 
aatgttcaagatggttctaaagatatattcggtaaattcagctatattggatcaaccgtt 
aaaacgttattaaatcctgttaaatttgatttctcattgactgttgatggtgaaacaaaa 

15 gaaggcaatacttcgatgatgttaatagcaaacggtcccaatataggtggtggacaaatt 
ccgctaaccgatttatcgccacaagatggaagagcaaacacatttgtatttaatgatcaa 
acactaaatatattgaatgatatattaaaaaaacgtgatagtatgaattggaacgaaatc 
acacaaggtattgatcacatatcaggtaagcacatcacactctcaacaaaccctagtatg 
aaagtggatattgatggcgaaattaatttagaaacaccaattgagattcaagtattaccc 

20 aaagcgatacaacttcttactgcaactgaacaaaataattaa 

Sequence 752 

MCKHLSLQLSENKGDIIKYCKSIKNENYSSDVDVLFILGGDGTLNELVNGVMQYQLNLPI 
GVIPGGTFNDFTKTLQLHPNFKTASEQLLTSHAESYDVLKVNDLYVLNFVGLGLIVQNAE 
25 NVQDGSKDIFGKFSYIGSTVKTLLNPVKFDFSLTVDGETKEGNTSMMLIANGPNIGGGQI 
PLTDLSPQDGRANTFVFNDQTLNILNDILKKRDSMNWNEITQGI DHISGKHITLSTNPSM 
KVDIDGEINLETPIEIQVLPKAIQLLTATEQNN* 

Sequence 753 
30 Cont ig_05 1 l_pos_32 9 3_64 2 , 

is similar to (with p-value 0.0e+00) 

>sp : sp | Q24 803 | ADH2_ENTHI ALCOHOL DEHYDROGENASE 2 (EC 1.1.1.1 
) (ADH) / ALCETALDEHYDE DEHYDROGENASE (EC 1.2,1.10} (ACDH) . 
>gp:gp|U04 8 63|EHU04 8 63_l Entamoeba histolytica HM1 : IMSS alco 
35 hoi dehydrogenase 2 ( EhADH2 ) mRNA, complete cds. NID: g48842 
9. 

atgtttgtgaattatttcacaatatctaaggagtggttgtatatgttatctgtaactaaa 
aaaaatacatatgaatcaaacaaagatgaagtcacacaaatgattgattcattagcagaa 
aaaggacaagaagctctaaaagaactatctaaaaaatcacaacatgagattaatgacatt 

40 gtacatcagatgagcatggctgctgttgatcagcatatgcatttagctaaactagcttac 
gacgaaacaggtagaggtatttatgaagacaaagctatcaaaaatttatatgcctcagag 
tacatatggaattcaatcaaagacaataaaacagtaggtatcataggtgaagataaacaa 
aaaggattaacgtatgtagctgaacctataggcgtgatttgtggagtgacaccaacgacc 
aaccctacatctacaactattttcaaagcaatgattgctattaaaacaggtaatccaatt 

45 attttcgcatttcatccaagtgcacaacaatcatcaaaatatgctgctaaagtcatttta 
gaagctgcaacaaaagcaggtgctcctaaagattgtattcaatggatagaagtgccatca 
attgaggcaactaaacaattaatgaatcataaagatattgctttagttctagcgactgga 
ggctctggaatggtaaagtccgcatattcgacaggtaaacctgcattaggagtcggtcca 
ggtaatgttcctacttatattgaaaaaactgctcatatcaaacgtgctgttaatgatatc 

50 attggttctaaaacttttgataatggtatgatttgtgcttctgaacaagtcatggttgtt 
gataaagaagtatacactgacgtcgttaaagaattcaaattacaccaaacatattttgtt 
aataaaaatgaactacaacaattagaagatgccatcatgaatgaagataaaactgcagtt 
aaacctgatatagttggtaaatctgctgtagatatagcgaaattgtcaggaattagtgtt 
ccagaaaaaacaaaattattagtcgcagaaattgatggaattggaaaagattatccttta 

55 tcacgtgaaaaattatcacctgtactcgcaatggtaactgcaaaatcaacaggacatgca 
ctacaaatttgtgaagacatattaaaatttggtggtttaggtcacactgctgtaattcac 
accgaggatagtcaattacaacaaaaattcggtctaaaaatgaaagcttgccgtgtattg 
gtaaatacaccttctgctgtcggaggaattggaaatatgtataatgaactcattccttca 
ctcacgttaggttgtggttcatatggtagaaattctatttctcataatgtaagcgcagta 
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gacttattaaatattaaaacaatagcaaaacgtcgtaataatatgcaatggtttaaactc 
ccacctaaagtttattttgaagaaaattcagttatgtatttgacagagatggataatgtt 
gaacgtgtaatgatagtttgtgatccaggaatggttaatattggttatactgatatagtt 
gaacaagtgctgagacgccgagaaaaccaaccacaaatcaaagtgtttaacgaagttgaa 
5 cctaatccatcaactcatacagtctataaggggttagaaatgtttataaatttccaacct 
aatactattattgcactcggtggcggttcggcaatggatgcagccaaagcaatatggatg 
ttctttgagcatccagaaacttcattttttggggcaaaacaaaagttcttagatattcgt 
aaacgtacttataaaattaccaaacctaaaaacgcaaaatttatatgtataccaacgaca 
tcaggaactggttctgaagtgacaccttttgcagtaattactgatagcgagacacacgtt 

10 aagtatccactagcagattatgcgttaactcccgatattgctatcgtcgatccacaattc 
gtattaagtgtacctaaagatgttgccgcagatacaggaatggatgtt ttgacacatgcc 
attgaatcttacgtctctgtcatggcttcagattatacaagaggcttaagcttacaagca 
ataaagttaacttttgattatctaaaatcatcagttcaagaaaatgacaaacactcacga 
gaaaaaatgcataatgcttcaacaatggccggtatggcatttgccaatgcttttttagga 

15 , atttctcattctatcgcacataaaattggtggtgaatatggtattccccacggcagaaca 
aatgcta ttttattaccacatgtcattcgctataatgccaaagatccacaaaaacatgca 
ctgtttcctaaatatgatttctttagagcagatactgactatgctgacattgcaaaattt 
ttaggactcaaaggtaatacaactgaagaattagtggatgctctagctaatgcggtgtat 
gatttaggatgttcagttggtattgatatgaatttaaaatcacaaggcgtaactgaagag 

20 cttcttcactctactatagacagaatggctgaattagcatttgaagatcaatgtacaact 
gctaatccaaaagaaccgctaattagtgaacttaaaggcattatcgaaacagcatatgat 
tatgaaagataa 

Sequence 754 

25 MFVNYFTISKEWLYMLSVTKKNTYESNKDEVTQMI DSLAEKGQEALKELSKKSQHEINDI 
VHQMSMAAVDQHMHLAKLAYDETGRGI YEDKAIKNLYASEYIWNSIKDNKTVGIIGEDKQ 
KGLTYVAEPIGVICGVTPTTNPTSTTIFKAMIAIKTGNPI IFAFHPSAQQSSKY7\AKVIL 
EAATKAGAPKDCIQWIEVPSIEATKQLMNHKDIALVLATGGSGMVKSAYSTGKPALGVGP 
GNVPTYIEKTAHIKRAVNDIIGSKTFDNGMICASEQVMVVDKEVYTDVVKEFKLHQTYFV 

30 NKNELQQLEDAIMNEDKTAVKPDIVGKSAVDIAKLSGISVPEKTKLLVAEIDGIGKDYPL 
SREKLSPVLAMVTAKSTGHALQICEDILKFGGLGHTAVIHTEDSQLQQKFGLKMKACRVL 
VNTPSAVGGIGNMYNELI PSLTLGCGSYGRNSISHNVSAVDLLNIKTIAKRRNNMQWFKL 
PPKVYFEENSVMYLTEMDNVERVMI VCDPGMVNIGYTDIVEQVLRRRENQPQIKVFNEVE 
PNPSTHTVYKGLEMFINFQPNTIIALGGGSAMDAAKAIWMFFEHPETSFFGAKQKFLDIR 

35 KRTYKITKPKNAKFICIPTTSGTGSEVTPFAVITDSETHVKYPLADYALTPDIAIVDPQF 
VLSVPKDVAADTGMDVLTHAIESYVSVMASDYTRGLSLQAIKLTFDYLKSSVQENDKHSR 
EKMHNASTMAGb4AFANAFLGISHSIAHKIGGEYGIPHGRTNAILLPHVIRYNAKDPQKHA 
LFPKYDFFRADTDYADIAKFLGLKGNTTEELVDALANAVYDLGCSVGI DMNLKSQGVTEE 
LLH ST I DRMAELAFEDQCTTAN PKEPLI SELKG 1 1 ETAY DYER* 

40 

Sequence 755 

Contig_0512_pos_8604_8128, 

putative peptide of unknown function 

gtgaataaggatgctgagaaccctaaacctaaagaagggatagggacttggattggaaaa 
45 gatattaaaacactaacgcatcattatggacaagctgatcggtcttatccatataaaaat 
gggttaaaaaattatgtctttaaacagaaagatgaatattatattgtaagtactaataaa 
ggaacaatcacatcagtttatgccacaggtaaaggtgtgaaagtgagcccacttaaaata 
ggtgaaagttcatctcatatttttgaagatactagtattaatccagaaccaactgtcaaa 
acgaaaggtaaaacttataaatttgaaatgtctgatgaagacttaaagacacagacgtta 
50 attaaatatggagatgtttatgctcaaatatattctgatcaacaaactaataaaattttg 
gcagtgagatttttagatgcaaatacattggcaacactacaaccatataagttataa 

Sequence 756 

VNKDAENPKPKEGIGTWIGKDIKTLTHHYGQADRSYPYKNGLKNYVFKQKDEYYIVSTNK 
55 GTITSVYATGKGVKVSPLKIGESSSHI FEDTSINPEPTVKTKGKTYKFEMSDEDLKTQTL 
IKYGDVYAQIYSDQQTNKILAVRFLDANTLATLQPYKL* 

Sequence 757 

Con t i g_0 5 1 2_po s_8 1 1 3_7 241, 
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is similar to (with p-value 4.0e-18) 

>pir :pir I S58131 I S58131 integral membrane protein LmrP - Lact 
ococcus lactis >gp : gp I X89779 I LLLMRP_1 L.lactis DNA for LmrP 
gene. NID: gl052753. 
5 atggatgcgattacgcctgaagttgagcaatatatttataagataagttattggctgacg 
aatattgctgtcgcctttggtgcgctcataggtggattgatgtatggggcacataaatct 
atgttgtttttcatcgcttttgtcatttacattatggtttttatagcacttatcgtatgg 
ttgcctaaagatttaaatattgttactcagtcgcacacacatcatgctaatgagaaacaa 
ttctccatgggtcaaatattaaaaagttataaaccagcatttaaagatacaacatatcta 

10 cttctaattataggatttagtattttaacaatgggtgagttatctgcatcatcgtacatt 
tcagtgcgtttaaaacaagagtttgatccgatgatattgttttcgttacatatcaatggc 
gttaaaatgtattcacttctattaatgacgaatacaatcattgttataatttttacctat 
tttatttcaaaaattgttatgagaatgaatgttaaaacagcattattggttggaattatt 
tt ttatgtcattggatattcgaacctcacttatcttaatgattttacgt tact tat cata 

15 tttatgattatagcgacgataggtgaaatggtatattctccaattcttgaagaaaatcgt 
tttaaaatggttccttctcataaaagagggacatattcagcagtgcatgctttaggattt 
aacctagctgaattacttgcaagatttggaattatattaggagtgtttttaacttcaatg 
gagatggggatctatatgtttgttttattattactaggtggcatgtcactttacattgca 
gtgagtcgttttaataatacaaattcacaataa 

20 

Sequence 758 

MDAITPEVEQYI YKISYWLTNIAVAFGALIGGLMYGAHKSMLFFIAFVI YIMVFIALI VW 
LPKDLNIVTQSHTHHANEKQFSMGQILKSYKPAFKDTTYLLLI IGFSILTMGELSASSYI 
SVRLKQEFDPMI LFSLHINGVKMYSLLLMTNTI I VI I FT YFISKI VMRMNVKTALLVGI I 
25 FYVIGYSNLTYLNDFTLLII FMIIATIGEMVYSPILEENRFKMVPSHKRGTYSAVHALGF 
NLAELLARFGIILGVFLTSMEMGI YMFVLLLLGGMSLYIAVSRFNNTNSQ* 

Sequence 759 
Contig_0512_pos_0_6968, 

30 is similar to (with p-value 0.0e+00) 

>gp:gp| AF007865 | AF007865_3 Bacillus lichenif ormis bacitracin 
synthetase operon including bacitracin synthetase 1 (bacA) , 
2 (bacB) and 3 (bacC) genes, complete cds. NID: g2982193. 
gtgacatactgggtaaagttaagtcgcgacattgagttacgtagattaatgtatgcatta 

35 ttagatgtcgttcaaagtcaacctgtgttgcgtacacagtttgtgacagatgattttaat 
caactcaagataaatttaagagatttttttccatttattgaaattaaagaagttaatgaa 
atgtcgcaaagcatagatttagaagcattctttacacgtaatttaaattcctaccatttc 
aatcaattacctctgtttaattttaagatatatcaatttctagatggcgcctacctactt 
ttagatttccacgctactatttttaatgaaagtcaattaactccatttttacaacaatta 

40 aatattgcttatacccactctttaaaaagtgaatatagtatctcggatttttataattgg 
attaaagaaatgaatcaaaagatggatcaaaatcaagttgtgtgtccatcaaagcacttc 
aacgtattgaatgcagacggtgataattacgcttacatacctgtaaagaatacatctgaa 
aagaaaaaaatgtgttctttgcatgcagaactaccatctttagacattgatgcgtggatt 
gtaagtatttacttagcgcatcattttataagtcagtcttctgatgtgacgttaggcatc 

45 catttttcgatagataataaaaatactgagaatatgatggttttaaacacagacattgcc 
ccacttaatttaagtattagtcaaagtgacgccgtaaaagatatggtggatgagtgttcc 
gcgctacttgaagagcttcaaatgtgtggtgcgtcttttgttgttcaacctaaagcagta 
caaatagatgtagaaacgatgattcatattgaaaaagtacaagaacaatttgagcttaat 
catatatgtcatcatatacatcgtctatacaatgaagcatcatcattcgcggatttagag 

50 ttttatcctcatgtgcagggtggttttgatatagtttataatgacaacgtttatgatgaa 
ttaactgtaaatacgttagtcaaattaattaatgggatttatatgcaaattacacaaaat 
ccatcattattaattaaagatataaaactcagtgatcgctcagatttagctaaatataat 
gacatcaatcttcaaaacaatgacattaattatagtgaggtcacttataaaaccgtggtt 
gaaagattcgaacgtcaagtgcaccaacatcccgatagtattgcgttgcaatatgaacaa 

55 cgatcgatgacatatcatcaattaaatcaatgtgcgaatcttttagcatatagattgcgt 
ttaaatcatcagattgaacctaatgatatggtggcattaatagcagaacgcagcttagaa 
atgattattggaatgttagggatcttgaaagctggtgcaggctacataccaattgatccg 
gattatcctgaagaaagaatgaattatattattgaggacgcaaaacctaaagcggttgta 
acatatcgtacatcatttcaatcaggtttacctcaaatggatatagaattgatagttgat 
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tcaagagaacatgatattgataacccgagaggcattaattgttcagaagatatcgcttat 
gtcatctatacatcaggaacgactggtaaacctaaagggacactggtgccacatagagga 
attgatcgcttagtacacaatccaaattatgtcgaattgaacgaaaatacaaccgtctta 
ttatcaggaacagtagcttttgatgcagcaacctttgaaatatatggtccattattgaat 
5 ggtggacggttagtcattacatctaaagatacgttgttaaatcctcaattgttagatcaa 
gctattactgaaaataaagtcaacacgatgtggttaacgtcatctttatttaatcaaatt 
gctagcgaacgtatcgaagcactagaatctttaacttatt tact tat tggtggggaagtg 
ttaaatgctaaatgggttcacttattaaattcgcgtgagtgtcatcctcaaataatcaat 
ggttatggaccgacagagaatacaacatttactacaacttttgcgattccacaagagatg 

10 ccttcacgtatacctattggtttacctattagtggaacgacagtttatgtcatgcaaggt 
aatcgtatttgtggcgtaggtgttccaggtgaattgtgcattggtggtgcaggtttagca 
aaaggttatttaaatcaacctaaacttactgctgaacgttttattcagtcaccttttaac 
aatgaaatgctttatcgaagcggtgatttagttcgtcttcaagaagatggctatattgat 
tatattagtcgtatcgataagcaagttaaaatacgcggttttagaatagaattatcagaa 

15 attgaaaaagcattagaagctatacgtgatattaataaagctgtagtcatcgttcgagag 
caagaccaagataaacaaatagtggcatattatgaagcatcgcaattaaaatcaacaggt 
caattaaaagatattttaagtgaaacattacctgaatatatgatacctgtgcattttatg 
aaggtggatcgtatacctatcacgatgaatgggaaattagatgtgcgtgcattacctgaa 
attaatctaaagaataatagaaattatgtagaaccacgtaacgatattgaacgcacagtt 

20 tgccgtattttcgaagagattttacatgttgatcaggtaggtgttaaagataatttcttt 
gaactaggtggacactctcttagagcaacattagttgtaaaccgtattgaagaaaggtta 
aaaaaacgtcttaaagtaggtgatttaatgaaatcgcctactgtagagcaacttggacaa 
caaattgaagaactgcaaaatgatgtctatgaagtgattcccaaagcaaatgaatcgtat 
caatatgatttaagtgcgtctcaaaaaagtatgtatcttttatggaaggtcaatcctaaa 

25 gacacagtgtataacattccattcttatggagattatcttctgaacttaatgttatgcaa 
ttgcaacgtgcattatctaagttgattgaacgtcatgaaatattacgaacacaatatgta 
attgatgacaatgaagttaaacaacgtattgcgacacatgtttcgcctgattttgaagag 
gtaacgacatctctaacgaacgagcaagatattattcaatcatttatggaaccgtttgat 
ttagaacaaccaagtcagatgcgagttaaatatatacatggaccacaacaagattattta 

30 tttatggatactcatcatagtattaatgatggtatgagtaacacgattttactatctgat 
ttgaacgctttataccaagataaatcattacctgaacttaagcttcagtataaagattat 
agtgagtggatggtgcacagagacttatctaaacaacgtcacttttggttacagcaattt 
gaaaatcaggttccaatattaaatatgcctacggattatcctagaccaagtattaaaaca 
accaacggtaatatgttgacgtttcattacaatcgtcaaatcaaacagcaattgaaatct 

35 tatgtagaacaacatcaagtgacagactttatgttctttgctagtgcaatcatggtatta 
ttgcacaaatatacacgtcaggacgatatcgctattggtagtgtaatcagtgcgcgtact 
catcgcgatactgaaaatatgttaggtatgtttgctaatacacttgtatatcgtggtcga 
ccacatgatcaaaagacatgggatcaattgatggctgagatgaaagagatgtgtctaggg 
gcatatgaacatcaagaatatccttttgaaagcttagtcaatgatcttgttgatgaaaga 

40 gatgcttcacataatccgttatttgatgtgatgctcgtacttcaaaataatgaaacaaat 
catgcgaattttggacatagtcaattgacacatattccacctcagtcaacaacagctaaa 
tttgatttgtcatttattattgaagaagatcaagatgactatgtcgtcaatattgaatat 
aatacagatttatataaacaagagaccattcatcatattgctgaacaacttcaaatgatt 
attaaacatgtaatatctaccgaaaacctaaaaattcaagatattgatgaaaatgatgac 

45 ttattaatttggttggacaagcatgtgaatgattgttctttagacttgccaaaaaataag 
tcaatacagcaacttttacatgatgtcatgaaagcgaaagcagatgatgtagcacttaaa 
atgaatggacaatcgatgacgtatcaagaacttgatgattattctaatagtatggctcaa 
acattgatacaaaatggcattcaaaaaggggaacgtgtagcccttttaactgaacgaagt 
tttgaaatggttgctagtatgattgctgtattaaaagttggaggttcttatgtacctatt 

50 gacgtcacttatcccgataaacgcattgaatttattattgaagacgctgaagtcgcagca 
gtgctcacatatggaaaagcaatatcctcacatataccagtaattaaaattgaagatatt 
gataacactgaaaataataaaaggttaaatatagaatatgcagggaatttggaagatgat 
atgtatcatatttatacatctggaacaacaggaaagcctaaagcagtatcagtgaaacaa 
cgtaatatattaaatttagtatgtgcttggacaaaaagactcaatttatccgatgatgaa 

55 gtctatctgcagtacgctaattatgtgttcgatgcttcggcaactgatttctactgtagt 
ttattaaatggatatccgcttgtcattgcaacatcagttgagcggaccaatacagattta 
ttagaaaagttaatttcacaagaaaatatcaccatcgcatctattccactacaggtatat 
aatgtgatgcatcatttctatattcctaaagtgattacaggaggtgcgccaagtactcca 
gcatttgttcaacatatttctaagcattgtgatatgtacgttaatgcctatgggccttct 
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gaaaatacagttataacatcttgttggatatacgaaaaaggtgacgccataccatcgact 
attccgattgggaaaccgttagctaatgttgatatttttattatgtcaggcggtaaacta 
tgtggcgttggtattccaggtgaattatgtattgcaggagaaagtttaacttcaggatat 
ttaaacagacccgaactttctgctgaaaaatttataaataatccttttgggccaggacaa 
5 ctttatcgaagtggtgatttagcacgattgatgccagatgggcaaattgaatttcttggt 
agaatagacaagcaagttaaagtacatggctatcgcattgaactaggtgaaattgaaaat 
atcattaattcagtagatactgttacagatagcgttgttattttagctaaacagggtgag 
cgtgaagtgctgcatgcttattatgttggaagtcaagaagatgaaagtcatatttcacaa 
catttaaatcaatatttgcctaaatacatgattcctaagacattaacagctattagcgaa 

10 attccattaacaggaaatgataaggtggatgagtcaagattacctgtacctaatgtacac 
aaaaataaatttgttgcaccacgtaataatatcgaacgagaaatagcacaaatcgttagc 
ggagtgttggacgtatcgtctatgagtatagatgatgacttctttgaaatgggtggtaca 
tcactagatgctatggtggtagtatcaaaactaaaatcaaatggcatacacattacaatg 
caagatgtatatcaatttaaaactgttcgttatatagctaatcacacagaaaaacgccaa 

15 gcactaccagaagtagtattaccagatcatctaccacaattacaatctttggttgaaaga 
cgataccaactaaaatcacaacacctaacgcaatcatctctaggtcatgtattgctaact 
ggtgcaacagggttcctaggcgcatatttaattgatgaaatgcaagatgatgctgatcaa 
attacatgtattgtcagaggtcatgatatcaatcaagctaaaactaacttggaaaataat 
ttaaattgttattttgatacggctcatgtggataaattaatgaagcacattgatattatt 

20 ttagcggatttatcagaacttgaccatcttattatcgattcagccattgatacaattatt 
catgctggagctcgtacagatcactttggcgatgatgaaacatttttcgatgtcaatgta 
agaagtacacaagcattaattgatttagctaaggataaaaaagcgaaattaatctatata 
tcaacgataagtgtgggtacggtatttgaagtacatcaagacgatattacattttctgaa 
aaagatttatataaaggccagttatttacatcaccatacactaaaagtaagttttatagc 

25 gagattaaagtgttagaagcggttaatgaaggtttagcagctcagattataagattagga 
aatctgacaagtgcttctactggaccattaaatatgaaaaatttaacaactaatcgtttt 
agtattgtcatgcatgatttattaaaaatgccgtttataggagaaagtatatcgaaagct 
aaagttgaattttcatttatcgatgtcacagcgcgccatattattaaattggcaagatcc 
aatgcaatacctattatttatcatgtatacgcaccatgttcgataactatgaaacaagta 

30 attgacaatgccaaagggtcagaaatgactgtagtaagtgatagtgagtttgaacagaaa 
ttacatgaattaggtatgcatgaattgattggtcttaatagtaatggagataatcaaatt 
tcaggtgt 

Sequence 760 

35 VTYWVKLSRDIELRRLMYALLDVVQSQPVLRTQFVTDDFNQLKINLRDFFPFIEIKEVNE 
MSQSIDLEAFFTRNLNSYHFNQLPLFNFKIYQFLDGAYLLLDFHATIFNESQLTPFLQQL 
NIAYTHSLKSEYSISDFYNWIKEMNQKMDQNQVVCPSKHFNVLNADGDNYAYI PVKNTSE 
KKKMCSLHAELPSLDI DAWI VSI YLAHH FI SQSSDVTLGIH FS I DNKNTENMMVLNTDIA 
PLNLS I SQS DAVKDMVDEC S ALLEELQMCGAS FVVQ PKAVQI DVETM I H I EKVQEQ FELN 

40 HICHHIHRLYNEASSFADLEFYPHVQGGFDIVYNDNVYDELTVNTLVKLINGI YMQITQN 
PSLLIKDIKLSDRSDLAKYNDINLQNNDINYSEVTYKTVVERFERQVHQHPDSIALQYEQ 
RSMTYHQLNQCANLLAYRLRLNHQIEPNDMVALIAERSLEMI IGMLGILKAGAGYIPIDP 
DYPEERMNYIIEDAKPKAVVTYRTSFQSGLPQMDIELIVDSREHDIDNPRGINCSEDIAY 
VIYTSGTTGKPKGTLVPHRGIDRLVHNPNYVELNENTTVLLSGTVAFDAATFEI YGPLLN 

45 GGRLVITSKDTLLNPQLLDQAITENKVNTMWLTSSLFNQIASERIEALESLTYLLIGGEV 
LNAKWVHLLNSRECHPQIINGYGPTENTTFTTTFAI PQEMPSRIPIGLPISGTTVYVMQG 
NRICGVGVPGELCIGGAGLAKGYLNQPKLTAERFIQSPFNNEMLYRSGDLVRLQEDGYID 
YISRIDKQVKIRGFRIELSEIEKALEAIRDINKAWIVREQDQDKQIVAYYEASQLKSTG 
QLKDILSETLPEYMIPVHFMKVDRIPITMNGKLDVRALPEINLKNNRNYVEPRNDIERTV 

50 CRIFEEILHVDQVGVKDNFFELGGHSLRATLVVNRIEERLKKRLKVGDLMKSPTVEQLGQ 
QI EELQNDVYEVI PKANESYQYDLSASQKSMYLLWKVNPKDTVYNI PFLWRLSSELNVMQ 
LQRALSKLIERHEILRTQYVIDDNEVKQRIATHVSPDFEEVTTSLTNEQDIIQSFMEPFD 
LEQPSQMRVKYIHGPQQDYLFMDTHHSINDGMSNTILLSDLNALYQDKSLPELKLQYKDY 
SEWMVHRDLSKQRHFWLQQFENQVPILNMPTDYPRPSIKTTNGNMLTFHYNRQIKQQLKS 

55 YVEQHQVTDFMFFASAIMVLLHKYTRQDDIAIGSVISARTHRDTENMLGMFANTLVYRGR 
PHDQKTWDQLMAEMKEMCLGAYEHQEYPFESLVNDLVDERDASHNPLFDVMLVLQNNETN 
HANFGHSQLTH I PPQSTTAKFDLS FI IEEDQDDYVVNI EYNTDLYKQETIHHI AEQLQMI 
IKHVISTENLKIQDIDENDDLLIWLDKHVNDCSLDLPKNKSIQQLLHDVMKAKADDVALK 
MNGQSMTYQELDDYSNSMAQTLIQNGIQKGERVALLTERSFEMVASMIAVLKVGGSYVPI 
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DVTYPDKRIEFI IEDAEVAAVLTYGKAISSHIPVIKIEDIDNTENNKRLNIEYAGNLEDD 
MYHIYTSGTTGKPKAVSVKQRNILNLVCAWTKRLNLSDDEVYLQYANYVFDASATDFYCS 
LLNGYPLVIATSVERTNTDLLEKLISQENITIASIPLQVYNVMHHFYIPKVITGGAPSTP 
AFVQHISKHCDMYVNAYGPSENTVITSCWIYEKGDAIPSTIPIGKPLANVDI FIMSGGKL 
5 CGVGIPGELCIAGESLTSGYLNRPELSAEKFINNPFGPGQLYRSGDLARLMPDGQIEFLG 
RIDKQVKVHGYRIELGEIENIINSVDTVTDSVVILAKQGEREVLHAYYVGSQEDESHISQ 
HLNQYLPKYMIPKTLTAISEIPLTGNDKVDESRLPVPNVHECNKFVAPRNNIEREIAQIVS 
GVLDVSSMSIDDDFFEMGGTSLDAMWVSKLKSNGIHITMQDVYQFKTVRYIANHTEKRQ 
ALPEVVLPDHLPQLQSLVERRYQLKSQHLTQSSLGHVLLTGATGFLGAYLIDEMQDDADQ 
10 ITCIVRGHDINQAKTNLENNLNCYFDTAHVDKLMKHIDIILADLSELDHLII DSAIDTII 
HAGARTDHFGDDETFFDVNVRSTQALIDLAKDKKAKLI YISTISVGTVFEVHQDDITFSE 
KDLYKGQLFTSPYTKSKFYSEIKVLEAVNEGLAAQIIRLGNLTSASTGPLNMKNLTTNRF 
SIVMHDLLKMPFIGESISKAKVEFSFIDVTARHI IKLARSNAIPI IYHVYAPCSITMKQV 
IDNAKGSEMTVVSDSEFEQKLHELGMHELIGLNSNGDNQISGV 

15 

Sequence 7 61 

Contig_0513_pos_522_938, 

putative peptide of unknown function 

atgttcccaccccgaacacctagtagagatgccactaacccacctcaacaactcttacaa 
20 ttactaggaataccgtcaaatattacccatctcacatacaatttttctgagcatgcatta 
ccttggataagttttatcgtacactatagtttttctatcgctattgcaataatctatatt 
tatatcgcaaagaaatatacaaaaatcacactaggttatggtgctttatttggtatagtt 
atttggattgtttttcatttaatcttaatgccaattatgcatgtcgtaccgaatgctttt 
gatcaaccattttcagaacacctatcagaattttttggacacattgtttggatgattacg 
25 tctacgtacaacacgcattttcattttttgtcgttgttttttcttattttcttgtga 

Sequence 762 

MFPPRTPSRDATNPPQQLLQLLGI PSNITHLTYNFSEHALPWISFIVHYSFSIAIAIIYI 
YIAKKYTKITLGYGALFGIVIWIVFHLILMPIMHWPNAFDQPFSEHLSEFFGHIVWMIT 
30 STYNTHFHFLSLFFLIFL* 

Sequence 7 63 

Contig_05 1 3_pos_102 90_98 1 1 , 
putative peptide of unknown function 

35 atgagtatgttagtaacacttaaaggcttacccttagcttataataaagacatgcaagaa 
gataaagaaggtttatttgatgctgtacacacacttaaaggctctcttcgaatcttcgaa 
ggtatggttgcatctatgaaagttaattcaaaccgtttaagtcaaacagtaaaaaatgat 
ttttcaaatgcaacagaattagcagactatttagtcagtaaaagtgtaccttttagaacc 
gctcatgaaatcgttggtaaaatcgtattaaattgtattcataaaggtatatacctatta 

40 gacgtacctttaagcgaatatcaagaacatcatgagaatattgaggaagatatatatgat 
tatttaacacctgaaaattgtctcaagcgtcgccaaagctatggttcaactggtcaagaa 
tcagtaaaacatcaactaaaagtcgcaaaagcattattaaaagacaacgaatcaaaatag 

45 Sequence 764 

MSMLVTLKGLPLAYNKDMQEDKEGLFDAVHTLKGSLRI FEGMVASMKVNSNRLSQTVKND 
FSNATELADYLVSKSVPFRTAHEIVGKIVLNCIHKGIYLLDVPLSEYQEHHENIEEDIYD 
YLTPENCLKRRQSYGSTGQESVKHQLKVAKALLKDNESK* 

50 Sequence 765 

Contig_0513_pos_94 03_905 9, 

putative peptide of unknown function 

gtgacaaaccggaggaaggtggggatgacgtcaaatcatcatgccccttatgatttgggc 
tacacacgtgctacaatggacaatacaaagggcagcgaaactgcgaggtcaagcaaatcc 
55 cataaagttgttctcagttcggattgtagtctgcaactcgactatatgaagctggaatcg 
ctagtaatcgtagatcagcatgctacggtgaatacgttcccgggtcttgtacacaccgcc 
cgtcacaccacgagagtttgtaacacccgaagccggtggagtaaccatttggagctagcc 
gtcgaaggtgggacaaatgattggggtgaagtcgtaacaaggtag 
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Sequence 766 

VTNRRKVGMTSNHHAPYDLGYTRATMDNTKGSETARSSKSHKVVLSSDCSLQLDYMKLES 
LVIVDQHATVNTFPGLVHTARHTTRVCNTRSRWSNHLELAVEGGTNDWGEVVTR* 

5 Sequence 767 

Contig_0513_pos_5415_397 9, 

is similar to (with p-value 0.0e+00) 

>gp:gp| AF054624 | AF054624_1 Lactobacillus sakei transcription 
-repair coupling factor (mfd) gene, partial cds; L-lactate d 
10 ehydrogenase (IdhL) gene, complete cds; and unknown genes. N 
ID: g3511014. 

atgcaagattttccggtcgaaattcaattggtaagtcgattccgcacagctaaagaaata 
agggaaactaaagaagggctcaaatcaggatatgttgacattgtcgtaggtacacataaa 
ttattaggtaaagatattcaatataaagatttgggattgcttattgttgatgaagaacaa 

15 cgttttggagtgcgacataaagaacgcattaaaactttgaaaaaaaacgttgatgtactg 
acgcttactgcaacaccaataccaagaacattgcatatgagtatgttaggtgtacgtgac 
ttatcagtgattgaaacaccacctgaaaatcgttttcctgtacaaacttatgtcttagaa 
cagaatacgaactttattaaagaggcattagagcgtgaattatctcgcgatggacaagta 
ttttatttgtataacaaagtgcagtccatttatgaaaaaagagaacaacttcaaaggtta 

20 atgcctgacgctaacattgctgtagcacatggccaaatgactgaacgtgatttagaggaa 
acaatgttaagctttattaatcacgagtacgatattttagtaacgactacaattattgaa 
acaggtgtagatgtaccaaatgctaatactttaatcatagaagaggctgatcgttttggt 
ttaagccagctataccaattaagaggacgtgtaggacgttcaagtagaattggttacgct 
tatttcttacatccagctaacaaagtgttaaatgagactgctgaagagcgattgcaagct 

25 attaaggagtttaccgaactaggttcaggttttaaaatcgctatgcgagatttaaatatt 
cgtggtgcaggcaatttactcggtaagcaacaacatggctttattgattcggttggtttc 
gatttatactctcaaatgttagaagaagcagtaaacgaaaaacgtggcattaaagaagaa 
tcgccggatgcaccagatattgaagtagaattgcacttagatgcttatttaccagctgaa 
tatatacaaagtgaacaggctaaaattgagatttataaaaaacttcgaaaagtagaaact 

30 gaagaacaacttttcgatgtcaaagatgaattaatagatcgttttaatgattatccaatt 
gaagtcgaacgattattagatattgttgaaatcaaagtccacgctctacatgcgggtgtc 
gaattgataaaagacaaaggcaaatctatacaaatcattttatcacctaaagcgactgaa 
gatattaatggagaagaattgtttaaacagacgcaacctcttggtagagcaatgaaagtt 
ggcgtgcaaaataatgcaatgaatgtaacgctaacaaaatcaaaacaatggttagatagt 

35 ttgaaattcttagttagatgtattgaagaaagtatggcgattaaagatgaagactaa 

Sequence 7 68 

MQDFPVEIQLVSRFRTAKEIRETKEGLKSGYVDIVVGTHKLLGKDIQYKDLGLLIVDEEQ 
RFGVRHKERIKTLKKNVDVLTLTATPIPRTLHMSMLGVRDLSVIETPPENRFPVQTYVLE 

40 QNTNFIKEALERELSRDGQVFYLYNKVQSI YEKREQLQRLMPDANIAVAHGQMTERDLEE 
TMLSFINHEYDILVTTTI IETGVDVPNANTLIIEEADRFGLSQLYQLRGRVGRSSRIGYA 
YFLHPANKVLNETAEERLQAIKEFTELGSGFKIAMRDLNIRGAGNLLGKQQHGFIDSVGF 
DLYSQMLEEAVNEKRGIKEESPDAPDIEVELHLDAYLPAEYIQSEQAKIEIYKKLRKVET 
EEQLFDVKDELIDRFNDYPIEVERLLDIVEIKVHALHAGVELIKDKGKSIQIILSPKATE 

45 DINGEELFKQTQPLGRAMKVGVQNNAMNVTLTKSKQWLDSLKFLVRCIEESMAIKDED* 

Sequence 769 

Con t i g_05 1 3_pos _3 9 2 6_2 4 4 8, 

is similar to (with p-value 6.0e-48) 

50 >sp:sp|P37555|YABM_BACSU HYPOTHETICAL 57.4 KD PROTEIN IN MFD 
-DIVIC INTERGENIC REGION . >gp : gp | D2618 5 | BAC180K_120 B. subti 
lis DNA, 180 kilobase region of replication origin. NID: g46 
7326. >gp:gp| Z99104 |BSUB0001_57 Bacillus subtilis complete g 
enome (section 1 of 21): from 1 to 213080. NID: g2632267. 

55 gtgaagatactaagtgccatttatcgcattccgtatcaaaatgttttaggtgatgacggt 
ttatatgcttatcaacaaatatatcctgtcgtagcactaggggttattttatctatgaat 
gctattccaagtgctgtgactcaagtgataggtgttaatcgatccgatgaagtctataca 
agggttatgtt'tcgattacaatgcataggttttatcgtctttattttgctttttatgttt 
gcgaatatgattacccgatggatgggcgattctaatttagcacccatgttaaagatggcc 
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agttttagttttattttaataggtgtcttaggagtgttaagaggattttatcaatcaaaa 
caagtaatgaccataccagcaatttcccaggttatagaacaggtaattagagttagttta 
atcattgttgcaattattatgttttcaatgaaacactggtctatttatcaagcaggagca 
ttagctatattggcatcttcgattggttttttaggttcaatgttatatttattacttaaa 
5 aaaccacttaaacttaagttatgctatcgctttaataatacttccattcaatggaagcag 
ttgtttatttccatatccatatttgcattgagtcaacttatcgttattttatggcaagtt 
gtggatagttttacaataatacgtttattacaacatagcggtattgcttttaaagaagca 
attattcaaaaaggcatttatgatcgtggtgcttcatttatacaaatgggtttgattgta 
actacgacttttagtttcgttcttatcccattacttactcaagcaattcgtgaacataat 

10 caaattcatatgaatcgttatgcaaatgcatcaattaaaatcacggtagtaataagtaca 
gcagctagtataggattaattaatctgcttccacttatgaatgttgtattctttaaaagt 
aatcatttaactctaactttgagtgtttatatgtttacagtgatatgtgtttcgttaata 
atgatgaatatctcattattacaagttcaaaccagtattcgtcccattattatgggtgtg 
ataataggaatactgtccaaaattattttaaatgttatattaatacctttttggggtatc 

15 gtgggtgcaagtgtgagtacagtcttatcactactactttttgtcataatattgcaagtt 
gcagtcttaaagtactaccgttttaatcgtatatctttatttatcgttaaacttatttta 
ggtatgataattatgagtatagttgttcaaactgtcatgcttgccttaccttcaaaaagt 
aggatgttaggattactagaacttatagttagctcaattataggcatagtgattataatg 
ttgtatattattatatttaatgtattaggatacaaagaaataaagcacttaccttttgga 

20 gacaaattatatcaaatgaagagaggaagacggtcatga 

Sequence 770 

VKI LSAI YRI PYQNVLGDDGLYAYQQI Y PVVALGVI LSMNAI PSAVTQV IGVNRSDEV YT 
RVMFRLQCIGFIVFILLFMFANMITRWMGDSNLAPMLKbdASFSFILIGVLGVLRGFYQSK 

25 QVMTIPAISQVIEQVIRVSLIIVAIIMFSMKHWSIYQAGALAILASSIGFLGSMLYLLLK 
KPLKLKLCYRFNNTSIQWKQLFISISIFALSQLI VILWQVVDSFTIIRLLQHSGIAFKEA 
I IQKG I YDRGASFIQMGLIVTTTFS FVLI PLLTQAI REHNQI HMNRYANAS I KITVVI ST 
AASIGLINLLPLMNVVFFKSNHLTLTLSVYMFTVICVSLIMMNISLLQVQTSIRPIIMGV 
IIGILSKIILNVILI PFWGI VGASVSTVLSLLLFVI ILQVAVLKYYRFNRISLFIVKLIL 

30 GMI IMSI VVQTVMLALPSKSRMLGLLELIVSSI IGIVIIMLYIII FNVLGYKEIKHLPFG 
DKLYQMKRGRRS* 

Sequence 771 

Cont ig_05 1 3_pos_2 1 1 2_1 261, 

35 is similar to (with p-value 3.0e-60) 

>sp: sp I P37556 I YABN_BACSU HYPOTHETICAL 56,1 KD PROTEIN IN MFD 
-DIVIC INTERGENIC REGION. >gp : gp I D26185 I BAC180K_121 B. subti 
lis DNA, 180 kilobase region of replication origin. NID: g46 
7326. >gp:gp| Z99104 | BSUB0001_58 Bacillus subtilis complete g 

40 enome (section 1 of 21): from 1 to 213080. NID: g2632267. 

gtgaaggtacttggaggaaaaagttttattgatgacatttttgaagcggttgatgtagac 
cccaatgatggttttacactgcttgatggtacgtcattaaaagaatcggccttaaatgtt 
cgtacaaatacagtaattactcaagtttatagcgtaatgatagctgccgatttaaaactt 
actttaatggaaagatatcctgatgattttaatgtgaaaataattactggttctcatagt 

45 gatggagctcacgtaattgaatgcccactttatgaaattgatcgctacgacgattatttt 
aataatcttacaagtttatttattccaaaaatcaatgaggatacattactttatcaagat 
tttgattacgcagttcaaactattgatttactcgttgataatgaaaaagggtgtccgtgg 
gataaagtacaaactcacgactcattaaaacggtatcttttagaagaaacgtttgaatta 
tttgaagccattgataatgaagatgattggcatatgatagaggaat taggagatatactt 

50 ttacaagtattattacattctagtataggtaaaaaagaaggatatattgatatcaaagaa 
attatagaaagtctcaacaccaagatgattcatagacatccacatatctttggtaatgcg 
catgtaacttcgcaagaggatttaaaagacatttggtcacgtgctaaagaaaaagaaggt 
aaagtgcctcgtgttaaatttgagaaagtatttgcagaccacttcttgaaattgtatgat 
aaaacaaaaaataggcaatttgacgaagatgatctcaaacaatttttacaacaaggagag 

55 aaaaattcatga 

Sequence 772 

VKVLGGKSFIDDIFEAVDVDPNDG FTLLDGTSLKESALNVRTNTVITQVYSVMIAADLKL 
TLMERYPDDFNVKIITGSHSDGAHVIECPLYEIDRYDDYFNNLTSLFIPKINEDTLLYQD 
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FDYAVQTIDLLVDNEKGCPWDKVQTHDSLKRYLLEETFELFEAIDNEDDWHMIEELGDIL 
LQVLLHSSIGKKEGYIDIKEIIESLNTKMIHRHPHIFGNAHVTSQEDLKDIWSRAKEKEG 
KVPRVKFEKVFADHFLKLYDKTKNRQFDEDDLKQFLQQGEECNS* 

5 Sequence 773 

Con t i g_0 5 1 5_pos_5 4 0 0 60 4 4, 

putative peptide of unknown function 

atgtctaaaatgcaaactgcccaaacaatgattatcatgcaagtatctatctttatcatc 
gctgcattgttaatcattttcatgcaagcaaccattaaaaatcctactcaattagaacaa 

10 ggacataaggaacctaaaagatatattttcgcctgggttctattaggattttgtattgtg 
atgatttaccaagtcatcatcagtataattttatttgccatcaatggcagtccgcaaaga 
agtccaaacactgaaaggttaatggctattgctaagcaaatgccgatatttatcgtttta 
atatccatagtaggtcccattttagaggaatacgtttttcgtaaagttatcttcggagaa 
ttgtataactttataaagggatcacgtgtggtaagctttatcattgcttctatagtaagt 

15 tctctaatattcgctttagcacataatgacttcaaatttataccggtatattttggaatg 
ggagtcattttctcacttgcttatgtttacacaaaacgaattgctgtacctataggcata 
cacatgcttatgaatggttcggttgtattaactcaagttgtgggtggagattctattaaa 
aaattgcaagaacaagcaacatttatattccatcttatattttaa 

20 Sequence 77 4 

MSKMQTAQTMI I MQVS I FI I AALL 1 1 FMQAT I KN PTQLEQGHKEPKRY I FAWVLLGFC I V 
MI YQVI ISI ILFAINGSPQRS PNTERLMAIAKQMPI FIVLISI VGPILEEYVFRKVI FGE 
LYNFIKGSRVVSFIIASIVSSLIFALAHNDFKFIPVYFGMGVI FSLAYVYTKRIAVPIGI 
HMLMNGSVVLTQWGGDSIKKLQEQATFIFHLIF* 

25 

Sequence 775 

Con t ig_0 5 1 5_pos_7 9 3 7_8 7 3 1 , 

is similar to (with p-value 0.0e+00) 

>gp:gp|AF012132 I AF012132_6 Staphylococcus epidermidis agr sy 
^ 30 stem including response regulator (agrA) , histidine kinase ( 
agrC) , AgrD (agrD) , AgrB (agrB) and delta toxin (hid) genes, 

complete cds . NID: g2981293. 
gtgaatatattgaaaatccaaatacttcaattcaatgtagaacgtggaaatgttgataaa 
aatatgcaaaatatcaaaactaagtttaatcaatacttagataaagataccagtgtcgtc 

35 gtgcttccagaaatgtggaataacggttatgcattagaagaattagaacaaaaagctgat 
aaaaatcttaaagacagctctctctttataaaagacttagcacatacatttaatgtagat 
atcattgcaggttcagtgtcaaatataagagaaaaccatatatataatactgcttttgca 
attaataaaaacaaagaattgattaatgaatatgacaaagtacatctcgtgccaatgtta 
cgtgagccagactttttatgtggtggaaatgtagtccctgaacctttttatttatctgat 

40 caaacacttttgacgcaaatcatttgttatgacttgcgatttccagagatattgcgctat 
ccagctagaaaaggtgctaaaattgctttttatgtagcgcagtggcctagctcaagacta 
gatcattggttatcattactaaaagcgagagcaatcgaaaatgatatttttattgtagct 
tgtaatagttgtggtgatgatggtcacaccaattatgctggaaattcaattgtcattaat 
cctaatggtgaaattttagaccatttagatgataaagaaggtgtactaacaacacatatc 

45 gatgtagacttagtagatcaacaaagagaatatattccagttttcagaaatctaaaacca 
catctttataaatag 

Sequence 776 

VNILKIQILQFNVERGNVDKNMQNIKTKFNQYLDKDTSVVVLPEMWNNGYALEELEQKAD 
50 KNLKDSSLFIKDLAHTFNVDIIAGSVSNIRENHI YNTAFAINKNKELINEYDKVHLVPML 
REPDFLCGGNVVPEPFYLSDQTLLTQIICYDLRFPEILRYPARKGAKIAFYVAQWPSSRL 
DHWLSLLKARAIENDI FI VACNSCGDDGHTNYAGNSI VINPNGEILDHLDDKEGVLTTHI 
DVDLVDQQREYI PVFRNLKPHLYK* 

55 Sequence 777 

Contig_0515_pos_94 79_100 69, 

is similar to (with p-value 5.0e-45) 

>9P -<3P I Z4 9220I SEHLDGN_2 Staphylococcus epidermidis hid and a 
gr[A,B,C, D] genes. NID: g3320006. 
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atgacacttgaggagagtagaaaacaagtgaaaatcatcgataaaaaaattgagcaattt 
gctcaatatttacaacgtaaaaataacttagatcacattgagtttctaaaagttcgttta 
gggatgcaagtagttgctggtaatattgaaaaaacagtggttctatatggactatcttat 
ttttttgatttgctcatttttacatttttaactcatattagttactttctattaagaata 
5 tttgctcatggtgcacatgctaaaacaactcttcaatg teat atacaaaatattctt tat 
tttttatttttaccttggctagtactacaccttcctcttagtacaaatatattttatttt 
ttagccatgattagttttttattagtaatatcttttgcaccggctgcaacaaagaaacaa 
cctatacctaaacgtttacttaagaagaaaaaagtactctccatattaagttttattgta 
atcataacaatcgctttaacactagaagaagtattcaaaaaaaatgttatctcgggtgtt 
10 gtaatagagtctattacacttttaccaatattttttcctaaggaggattaa 

Sequence 778 

MTLEESRKQVKIIDKKIEQFAQYLQRKNNLDHIEFLKVRLGMQVVAGNIEKTWLYGLSY 
FFDLLIFTFLTHISYFLLRIFAHGAHAKTTLQCHIQNILYFLFL PWLVLHLPLSTNIFYF 
15 LAMISFLLVISFAPAATKKQPI PKRLLKKKKVLSILSFIVI ITIALTLEEVFKKNVISGV 
VIESITLLPIFFPKED* 

Sequence 779 

Contig_0515_pos_10231_11529, 

20 is similar to (with p-value 0.0e+00) 

>gp:gp|AF012132|AF012132_2 Staphylococcus epidermidis agr sy 
stem including response regulator (agrA) , histidine kinase ( 
agrC), AgrD (agrD), AgrB (agrB) and delta toxin (hid) genes, 
complete cds . NID: g2981293. 

25 gtgtattcaatgggtaaacttgactttttaccatttgcagctatacaagtgtttcttttg 
gtttgggttacaaaaactattgctaatattaaatttgtaagaaaggattatattttcatt 
actggaattataatcctttctgcaatattatataatgtttatgcaagccaagcacttgta 
cttgtagtaataatgattataattttcttctattcaaaagtaagatggtattctattgtt 
atagtgttaatgagcactttgttgtcatatttaacaaattttattacagtagctatcagt 

30 t tat atactgaaaatataatacataatatttatttttataatatctt teat ttttcaatt 
tttatcattttatctttaattttggcacatttatttaaacacttattaattaggtttagg 
tattcttatttatatttaagcaaaaggtattacattattatttcttttgtattagctatt 
gcttttatatacttctatataatttcacaaactaatttacaagaaagtaatagcttgaac 
ttttatgctattatttttgtttctattaccgtacttttgagtttggttatattattgtta 

35 tcggctttcgcactacgtgaaatgaaatataaacgtaagctacaagaaatcgaagcatat 
tatgagtacacgttacgtatagaaagcattaacaatgaaatgcgtaagttccggcatgat 
tatgtgaatatcctcaccactctttcagattacattagagaagatgatatgcctggatta 
cgtaaatattttaatgaaaatatcgttccaatgaaagataaattaaaaactcgctctatt 
aaaatgaatggtattgaaaagttgaaagtgagagaaattaaagggctgattactactaaa 

40 attattcaagctcaagaaaaacgtattccaattagtattgaggttcctgatgaaattgat 
cgtatctctatgaatactgttgagcttagtcgtattatcggtattatagttgataatgct 
attgaagcttcagaaaatcttgaggaaccactcatcaatatcgcattcatcgataatgag 
gaatctgtcacttttatcgttatgaataaatgtagtgatgatatccctaaaattcatgag 
ttgtttgaacaaggtttttctactaaaggtgataatcgcggtttaggtttatcaacttta 

45 aaagaactgacagactcaaacgagaatgttttattagatactgtcatcgaaaatggttac 
tttgtacaaaaagtagaaataaataataaggaatcataa 

Sequence 7 80 

VYSMGKLDFLPFAAIQVFLLVWVTKTIANIKFVRKDYI FITGIIILSAILYNVYASQALV 
50 LVVIMIIIFFYSKVRWYSIVIVLMSTLLSYLTNFITVAISLYTENI IHNI YFYNIFHFSI 
FIILSLILAHLFKHLLIRFRYSYLYLSKRYYIIISFVLAIAFIYFYIISQTNLQESNSLN 
FYAI I FVSITVLLSLVILLLSAFALREMKYKRKLQEIEAYYEYTLRIESINNEMRKFRHD 
YVNILTTLSDYIREDDMPGLRKYFNENIVPMKDKLKTRSIKMNGIEKLKVREIKGLITTK 
IIQAQEKRIPISIEVPDEIDRISMNTVELSRIIGIIVDNAIEASENLEEPLINIAFIDNE 
55 ESVTFIVMNKCSDDIPKIHELFEQGFSTKGDNRGLGLSTLKELTDSNENVLLDTVIENGY 
FVQKVEINNKES* 

Sequence 781 

Contig_0515_pos_11636_12262, 
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is similar to (with p-value 0.0e+00) 

>gp:gp I AF012132 | AF012132_1 Staphylococcus epidermidis agr sy 
stem including response regulator (agrA), histidine kinase { 
agrC) , AgrD (agrD) , AgrB (agrB) and delta toxin (hid) genes, 
5 complete cds . NID: g2981293. 

atggagttagctttagcaacaaatgatccttatgaggtcttagagcaatcaaaagaactt 
aatgacattggttgttacttccttgatattcaattagaagctgatatgaacggtattaaa 
ttagccagtgaaattcgtaaacatgatcctgttggtaatattatatttgtaaccagtcac 
agtgagctgacttatttgacgtttgtttataaagtggctgctatggattttatttttaaa 

10 gatgatccatctgaattaaaaatgagaatcatagattgtctcgaaacagcacatacacga 
ctcaaattattatcaaaagaaagtaatgtagatacgattgagttaaagcggggaagtaat 
tcagtatacgttcaatatgatgatattatgttttttgaatcatctacgaaatctcatagg 
ctcattgcacatcttgataatcgacaaattgaattttatggaaatttaaaggaattagca 
cagcttgatgaacgtttctttagatgtcataacagttttgtgataaacaggcataatatt 

15 gaatctattgactcaaaagaacgtattgtttactttaagaatggcgaaaattgtttcgct 
tcagtacgtaatgttaaaaaaatataa 

Sequence 782 

MELALATNDPYEVLEQSKELNDIGCYFLDIQLEADMNGIKLASEIRKHDPVGNIIFVTSH 
20 SELTYLTFVYKVAAMDFIFKDDPSELKMRIIDCLETAHTRLKLLSKESNVDTIELKRGSN 
SVYVQYDDIMFFESSTKSHRLIAHLDNRQIEFYGNLKELAQLDERFFRCHNSFVINRHNI 
ESIDSKERIVYFKNGENCFASVRNVKKI* 

Sequence 783 
25 Contig_0515_pos_13997_13296, 

is similar to (with p-value 0.0e+00) 

>sp:sp|Q05936|SCRB_STAXY SUCROSE-6-PHOSPHATE HYDROLASE (EC 3 
.2.1.26) (SUCRASE) (INVERTASE) . >pir : pir I A4 7059 | A4 7059 sucra 
se ScrB - Staphylococcus xylosus >gp : gp | X6774 4 | SXSCRBA_2 S.x 

30 ylosus scrB and scrR genes. NID: g949973. 

atgataggagatttaaactttaataatctatttttcgaccatgaaagttttcaagaattg 
gataatggttttgatttctacgcgccacaaacgtttgttgatgcagacgggcaacgcatt 
ttaattggatggatgggactaccagatacagagtatcctacagataaagaggggtgggca 
cattgccttactattcctcgagtacttaccattgaaaatggaaaacttaagcagcgacct 

35 tttaagcagttagaagatttaagaactaataaagaaacagctttgggatatgctaataaa 
tttaaacgtaaattacatccatatgaaggtaagcagtatgagatgattatagatatatta 
gaaaatgatgcttcagaaatatattttgaattgcgtagctctcgatctgaatctacactg 
attacttataataaacacgaaaataaactcactttagaacgtaccgatagtgggacacta 
ccatcaaatgtcgatggaacaacgcgttctaccattttagattcaccattaaaacagtta 

40 caaatttttgtggatacatctagtatcgaaatattctgtaatgatggtgagcgtgtttta 
acctcacgtattttcccaaatgaggatgctacaggtataaaagcttcgactgaatctggt 
caagtatatttaaaattcactaaatatgaattaaaagggtga 

Sequence 7 84 

45 MIGDLNFNNLFFDHESFQELDNGFDFYAPQTFVDADGQRILIGWMGLPDTEYPTDKEGWA 
HCLTI PRVLTIENGKLKQRPFKQLEDLRTNKETALGYANKFKRKLHPYEGKQYEMIIDIL 
ENDASEI YFELRSSRSESTLITYNKHENKLTLERTDSGTLPSNVDGTTRST I LDSPLKQL 
QIFVDTSSIEIFCNDGERVLTSRIFPNEDATGIKASTESGQVYLKFTKYELKG* 

50 Sequence 785 

Contig_0515jpos_JL3289_12330, 

is similar to (with p-value 0.0e+00) 

>pir:pir I S207 99 I S20799 hypothetical protein 7 - Staphylococc 
us aureus >pir : pir | S58482 | S58482 hypothetical protein 7 - St 
55 aphylococcus aureus >gp : gp | X5254 3 | SAAGRAB_7 S. aureus agrA, a 
grB and hid genes. NID: g46505. 

atgagacgtctatttgctattggagaggcattaattgatttcataccaaatgtaacgcat 
tcaaaattaaaagatgttgaacaatttagtcgacaagttggtggcgcaccgtgtaacgta 
gcggctacagtaagtaaattaggtggaaaatcagaaatgataacacaactaggaaatgac 
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gcgtttggtgatatcattgtagaaacaattgaacaacttggcgtaggtacgcaatatatt 
aagcgaactaataaagcaaatactgcattggcatttgtcagccttcaagatgatggtcaa 
agagatttctcattttatcgtaaaccttctgcagatatgttatatcaacctgaaaatatt 
gatgatattcaagtattccaagacgatattttacatttttgttctgtagatttaatagag 
5 agtgacatgaaatatgcccatgagaaaatgattgaaaaattcgagtcagtagatggtact 
attgtttttgaccctaatgtacgcttacctttatgggaagataaactcgaatgtcaacgt 
acaataaatgcgttcatacctaaagcacatattgttaaaatatctgatgaagaattatta 
tttattactggtaagaggaatgaagatgaagcgattcaatctttatttagaggtcaagtt 
aatgtagtgatttatacacaaggagcgcaaggtgcaactatttataccaaagatgattat 
10 cgtattcatcatgaggggtatcaagtacaagcaattgatacaactggtgctggtgacgca 
tttataggtgctattatttattgcatactcgagtctcggcattctgaatgtaaagattta 
tttaaggaaaagggcaaggatatactagcgtttagtaatcgtgtcgcagcacttacaaca 
acgaaacatggtgctattgaaagtttgccaacaaaagaagatataaaagactataattga 

15 

Sequence 786 

MRRLFAIGEALIDFI PNVTHSKLKDVEQFSRQVGGAPCNVAATVSKLGGKSEMITQLGND 
AFGDI IVETI EQLGVGTQYIKRTNKANTALAFVSLQDDGQRDFSFYRKPSADMLYQPENI 
DDIQVFQDDILHFCSVDLIESDMKYAHEKMIEKFESVDGTIVFDPNVRLPLWEDKLECQR 
20 TINAFIPKAKIVKISDEELLFITGKRNEDEAIQSLFRGQVNVVIYTQGAQGATIYTKDDY 
RIHHEGYQVQAIDTTGAG DAFIGAIIYCILESRHSECKDLFKEKGKDILAFSNRVAALTT 
TKHGAIESLPTKEDIKDYN* 

Sequence 787 
25 Contig_0515_pos_7564_6104, 

putative peptide of unknown function 

atgcttacgggctttgctttcatggtaactacatcattattcagtcaccaagcacatgct 
gaaggtaatcatcctattgacattaatttttctaaagatcagattgatagaaatacagct 
aagagcaatattatcaatcgagtgaatgacactagtcgcacaggaattagtatgaattcg 

30 gataatgatttagatacagatatcgtttcaaatagtgactcagaaaatgacacatattta 
gatagtgattcagactcagatagtgacttagattcagatagtgattcagattcagacagt 
gactcagattcagatagtgactcagattcagatagtgactcagattcagacagtgattca 
gactcagatagtgactcagattcagacagtgattcagactcagatagtgattcagattca 
gatagtgattcagattcagacagtgactcagactcagacagtgattcagat tcagatagt 

35 ga t t cagat t cagat agt gat t cagat tcagatagt gat tcagattcagacagtgactca 
gactcagacagtgattcagattcagacagtgactcagattcagatagtgactcagattca 
gatagtgattcagactctggtacaagttcaggtaagggttcacataccggaaaaaaacct 
ggtaaccctaaaggaaatacaaatagaccttctcaaagacatacgaatcaaccccaaagg 
cctaaatacaatcaaacaaatcaaaacaatataaacaatataaaccatataaaccataat 

40 attaatcatacacgtactagtggagatagtgcgccttttaaacgtcaacaaaatattatt 
aattctaacttaggtcatagaaatcaaaataatataaatcaatttatatggaacaaaaat 
ggcttttttaaatctcaaaataataccgaacatagattgaatagtagtgataataccaat 
tcattaattagcagattcagacaattagccacgggtgcttataagtacaatccgtttttg 
attaatcaagtaaaaaatttgaatcaattagatggaaaggtgacagatagtgacatttat 

45 agcttgtttaggaagcaatcatttagaggaaatgaatatttaaattcattacaaaaaggg 
acaagctatttcagatttcaatattttaatccacttaattctagtaaatactatgaaaat 
ttagatgatcaggttttagctttaattacaggagaaatcggctcaatgccagaacttaaa 
aaacctacggataaagaagataaaaatcatagcgtcttcaaaaaccatagtgcagatgag 
ataacaacaaataatgatggacactccaaagattatgataagaaaaagaaaatacatcga 

50 agtcttttatcgttaagtattgcaataattggaatttttctaggagtcactggactatat 
atctttagaagaaaagagtaa 

Sequence 788 

MLTGFAFMVTTSLFSHQAHAEGNHPIDINFSKDQIDRNTAKSNIINRVNDTSRTGISMNS 
55 DNDLDTDIVSNSDSENDTYLDSDSDSDSDLDSDSDSDSDSDSDSDSDSDSDSDSDSDSDS 
DSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDS 
DSDSDSDSDSDSDSDSDSDSDSDSDSGTSSGKGSHTGKKPGNPKGNTNRPSQRHTNQPQR 
PKYNQTNQNNINNINHINHNINHTRTSGDSAPFKRQQNIINSNLGHRNQNNINQFIWNKN 
GFFKSQNNTEHRLNSSDNTNSLISRFRQLATGAYKYNPFLINQVKNLNQLDGKVTDSDIY 
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SLFRKQSFRGNEYLNSLQKGTSYFRFQYFNPLNSSKYYENLDDQVLALITGEIGSMPELK 
KPTDKEDKNHSVFKNHSADEITTNNDGHSKDYDKKKKIHRSLLSLSIAIIGI FLGVTGLY 
IFRRKE* 

5 Sequence 789 

Con t i g_0 5 1 5_po s_4 7 8 9_3 170, 

is similar to (with p-value 0.0e+00) 

>gp:gp|U13618 |SEU13618_2 Staphylococcus epidermidis 9759 hea 
t shock protein 10 (hsplO) and heat shock protein 60 (hsp60) 

10 genes, complete cds . NID: g535340. 

atggcaaaagatcttaaattctctgaagatgcgcgtcaagcaatgttacgtggtgttgat 
aaattagcaaacgctgtaaaggttacaattggacctaaagggcgaaatgtggttctagat 
aaggattacacaacacctttaattaccaacgatggtgtaacaattgctaaggaaatagag 
ttagaagatccatatgagaatatgggtgcaaaattagtgcaggaagttgcgaataaaaca 

15 aatgaaatcgctggggacggtacaactacagcaacagttttagcacaatcaatgattcag 
gaaggtcttaagaatgttacaagtggtgcaaatcctgtaggcttaagacaaggtattgac 
aaagcagtgcaagtggctatagaagcgcttcatgagatttctcaaaaggttgaaaataag 
aacgagatagcgcaagttggagctatttcagcagcagatgaagaaatcggtcgctacatt 
tctgaagcaatgga-taaagtaggtaacgatggcgttatcactattgaagaatcaaatggg 

20 tttaatacagaattagaagtagttgaaggaatgcaatt tgatcgcggttatcaat caeca 
tatatggtaactgactcagataaaatgatagctgaattagaacgtccatatatattagta 
acggataagaaaatttcatcattccaagatattcttccattattagaacaagttgtgcag 
gctagtcgaccaattttaattgttgcggatgaagtagaaggcgatgcacttactaatatt 
gttttaaaccgtatgcgtggaacatttactgctgtagcagttaaagccccaggatttggt 

25 gatcgacgtaaagcaatgttagaagacctagcaatattaactggt get caagt cat tact 
gatgatttaggtttagaacttaaagatgcatctcttgatatgctaggtactgctaataaa 
gttgaagtgactaaagatcatacaacagtcgtagatggtaatggtgatgaaaataatatt 
gatgctcgtgtaggtcaaattaaagcacaaattgaagaaactgattcagagtttgataaa 
gaaaaattacaggaacgtttggcaaaactagctggcggcgtagctgttatcaaagtaggg 

30 gctgcaagtgaaacagagcttaaagaacgtaaattaagaattgaagacgcattaaattca 
acacgtgcggcggtggaagaaggtatcgttgctggtggtggtactgcgttagtcaatata 
tatcaaaaagtaagtgaaattaaagcagaaggtgatgttgaaacgggtgttaatatcgta 
ttaaaagcattacaagcacctgttagacaaattgctgaaaatgcaggattagagggttca 
attattgttgaacgtttaaaacatgctgaagcgggcgttggtttcaatgcagcaacaaat 

35 gaatgggttaatatgttagaagaaggtatagtagatccaactaaagtaactcgttcagcg 
ttacaacatgcagcaagtgtagctgctatgttcttaacaactgaagcagtcgttgctagt 
attccagagccagaaaataatgaacaacctggaatgggtggcatgccaggtatgatgtaa 

40 Sequence 790 

MAKDLKFSEDARQAMLRGVDKLANAVKVTIGPKGRNVVLDKDYTTPLITNDGVTIAKEIE 
LEDPYENMGAKLVQEVANKTNEIAGDGTTTATVLAQSMIQEGLKNVTSGANPVGLRQGID 
KAVQVAIEALHEISQKVENKNEIAQVGAISAADEEIGRYISEAMDKVGNDGVITIEESNG 
FNTELE V VEGMQFDRGYQS PYMVTDS DKMI AELERPY I LVTDKKI SS FQDI L PLLEQVVQ 

45 ASRPILIVADEVEGDALTNI VLNRMRGTFTAVAVKAPGFGDRRKAMLEDLAILTGAQVIT 
DDLGLELKDASLDMLGTANKVEVTKDHTTWDGNGDENNIDARVGQIKAQIEETDSEFDK 
EKLQERLAKLAGGVAVIKVGAASETELKERKLRIEDALNSTRAAVEEGIVAGGGTALVNI 
YQKVSEIKAEGDVETGVNIVLKALQAPVRQIAENAGLEGSIIVERLKHAEAGVGFNAATN 
EWVNMLEEGIVDPTKVTRSALQHAASVAAMFLTTEAVVASIPEPENNEQPGMGGMPGMM* 

50 

Sequence 791 

Co n t i g_0 5 1 5 jpo s_2 7 5 2_1 7 9 3 , 
putative peptide of unknown function 
55 atgaaagacaacaaacctaataat tcgaaattaattcaaacatatttaagtaagaaaact 
ttaagatatggtacagcaagtgcattaacattggcactctatttatttaacagtaacgta 
actgtgtatgcggatgaaaatactgcaaaccaaaatcaaggaacatcaccaaaaacttca 
cagacagcacctacaaataatactgaaaatacagatgccacagccataacaacagatcaa 
aataataatgatgaagaagaatacgatgcgtcatatgaacttccaattctttatgtaact 
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gtctggctagatgatcaaggaaatattattaaagatgctgtggaagatgctaaaacccct 
gcttcagaaaggcaaccggtgaaaattcctgggtaccaacattatagaacttctgtgagt 
gacggaattactaagtttatttatcgtaaaattagcactgcacaatcacctatagttgaa 
aataatcaacaagataataatacaaataaagttgttgaaacaaccaatcaaaataaagat 
5 gaagtgaatggaaaagaacaaaatcaagcaaatacttcagtaacaaatacacaaattacc 
aaaaacgagaaagacgaagacacaaaaacactaaagaaagataaagacgagaaagaatct 
aaagacacaaaaacaccaaagaaagacaaagaaaagaaagacataaaaactccgaagaaa 
gatagagaagagaaaaaaccagtaataccaaaaagcggcaaagacgagaaagacacaaaa 
ataactaagaaagacaaagaagacgaaattacaacaacttccaagaaagataataacaat 
10 gatgtacaagataaattaccggaaacaggtaaaacaaacgatattcaaaatcctgcttta 
ataatgttacttgctggtttaggtttattaggattatttagaaataaaataagagaatag 



Sequence 792 

15 MKDNKPNNSKLIQTYLSKKTLRYGTASALTLALYLFNSNVTVYADENTANQNQGTSPKTS 
QTAPTNNTENTDATAITTDQNNNDEEEYDASYELPILYVTVWLDDQGNIIKDAVEDAKTP 
ASERQPVKIPGYQHYRTSVSDGITKFI YRKISTAQSPI VENNQQDNNTNKVVETTNQNKD 
EVNGKEQNQANTSVTNTQI TKNEKDEDTKTLKKDKDEKESKDTKTPKKDKEKKDI KTPKK 
DREEKKPVIPKSGKDEKDTKITKKDKEDEITTTSKKDNNNDVQDKLPETGKTNDIQNPAL 

20 IMLLAGLGLLGLFRNKIRE* 

Sequence 793 

Cont ig_05 1 7_pos_7 5 0_1 070, 

is similar to (with p-value 5.0e-30) 
25 >sp : sp | P54 453 | YQEH_BACSU HYPOTHETICAL 41.0 KD PROTEIN IN NUC 

B-AROD INTERGENIC REGION . >gp : gp I D84 4 32 | BACJH642_92 Bacillus 
subtilis DNA, 283 Kb region containing skin element. NID: g 

2627063. >gp:gp|Z99117|BSUB0014_47 Bacillus subtilis complet 

e genome (section 14 of 21): from 2599451 to 2812870. NID: g 
30 2634966. 

atgatacctggtgtatcaaacataaatgatttttcgtctaatggaatatctatcatatct 
aaagttgttcctggaaagcgtgatgtagttactacatctttttctcccacactctgttca 
attaatttattaattaatgtagattttccaacattcgttgtacctacaatgtatacgtca 
tctttatttcttacatggtttatagattgcaataattcatcaatcccccaacctttattt 
35 gcagaaataagaacgacatcttctgcttctaatccatatttacgagcagattttctcaac 
cattcttttacacgtcgatga 

Sequence 794 

MIPGVSNINDFSSNGISIISKVVPGKRDVVTTSFSPTLCSINLLINVDFPTFVVPTMYTS 
40 SLFLTWFIDCNNSSIPQPLFAEIRTTSSASNPYLRADFLNHSFTRR* 

Sequence 795 

Contig_0517_pos_298 3_34 53, 
is similar to (with p-value 4.0e-29) 
45 >gp:gp|D50453|D50453_106 Bacillus subtilis DNA for 25-36 deg 
ree region containing the amyE-srfA region, complete cds . NI 
D: gl805369. 

atgagaattgcacctaatgtgattggtagaatccaaccattaatcgcaccagctattata 
agtaaactcaccggtttaccaataaataagaaaacaaaagttgaaattacaataaatgta 

50 ataacgataagattatttttattgagtaacgatttgtgtagtgtttttaaaaatgttgcg 
cttgtatatgcagaaccaattactgaggacattgctgctgcaaatattactacgccaaaa 
atatttttacctataggacctaatgcatgttggaaaactgatgctggtggattttctgaa 
ctaagcgtaacgccagttacaacaacacctagtacagctaaaaacaataaggtgcgcatg 
acaccagttgttaaaatacctgctacagcagatcgatttacgaaaggaaggtatgactta 

55 ccttttataccagaatctagaattctatgtgcacctgcaaaagtaatataa 

Sequence 796 

MRIAPNVIGRIQPLIAPAI ISKLTGLPINKKTKVEITINVITIRLFLLSNDLCSVFKNVA 
LVYAEPITEDIAAANITTPKI FLPIGPNACWKTDAGGFSELSVTPVTTTPSTAKNNKVRM 
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T PVVKI PATADRFTKGRYDLPFI PESRI LCAPAKVI * 
Sequence 797 

Contig_0517_pos_8261_7 662, 
5 is similar to (with p-value 3.0e-41) 

>sp:sp|P42967| YCSJ_BACSU HYPOTHETICAL 63.8 KD PROTEIN IN SIP 
U-PBPC INTERGENIC REGION. >gp : gp | D38161 | BAC39R_12 Bacillus s 
ubtilis genome around 39 degrees region encoding 17 ORFs, co 
mplete cds . NID: g!032472. >gp: gp | Z99106 I BSUB0003_56 Bacillu 
10 s subtilis complete genome (section 3 of 21) : from 402751 to 
611850. NID: g2632653. >gp: gp | D50453 I D504 53_108 Bacillus su 
btilis DNA for 25-36 degree region containing the amyE-srfA 
region, complete cds. NID: gl805369. 

atgataccttcatatcgtgctattctaatttactttgataaatcggggataaacggaact 
15 gaattattagaaaatttagaactcgatgaaaattcgaacagtagaaaacaatcacatttt 
aaacagcgta teat teat at acctgtattatatggtggagattttggtccagatttatca 
gaagtagctaacgttaataaattaagtcaggaagaagttattcaaatacatacacaacaa 
ccttatctaatctatatgcttgggtttatgccgggtttcccatacttaggtggattggat 
gctaagttgcacacacctagacggtctgaacctagaatcaaaattaacgctggttctgtt 
20 ggaatagcaaataatcaaacaggtttatatcctatggactcacctggtggttggcagata 
attggtcgcacaccaataaaagtctttgatttaaataggacaccaatgacgttatatgaa 
gctggtgattacatacaattttatagtataaattatcaagagtttgaaaaaatatcaaac 
gatattaataaaggaaaatttgatatagataagtgggtgacatatcaagatgagtattaa 

25 

Sequence 798 

MI PSYRAILIYFDKSGINGTELLENLELDENSNSRKQSHFKQRIIHIPVLYGGDFGPDLS 
EVANVNKLSQEEVIQIHTQQPYLI YMLGFMPGFPYLGGLDAKLHTPRRSEPRIKINAGSV 
GIANNQTGLYPMDSPGGWQIIGRTPIKVFDLNRTPMTLYEAGDYIQFYSINYQEFEKISN 
30 DINKGKFDIDKWVTYQDEY* 

Sequence 799 

Contig_0517_pos_7306_6668, 
is similar to (with p-value 2.0e-25) 
35 >sp : sp | P4 4 298 | YBGK_HAEIN HYPOTHETICAL PROTEIN HI1730. >pir:p 
ir |B64041|B64041 hypothetical protein HI1730 - Haemophilus i 
nfluenzae (strain Rd KW20) >gp:gp| U32845 ! U32845_ll Haemophil 
us influenzae Rd section 160 of 163 of the complete genome. 
NID: g3212236. 

40 gtggctcaaagctattctacacatgttagaagtggaatgggcggatttaaaggtcgtgca 
ttaaagaaatacgatgttattgcaactcaagtaaatcataactataaaactaatttagga 
aaaacaattgatttctcatctatacctgataataattacatacacgtcattgaaggacct 
cagatcaatgaatttgatgaagaaacgatagctaaattcgtaaatagcgatttcaaaatt 
tctgatcaatcagatcgaatgggatacagattaaaaggtaatacagtaccacctaaaaat 

45 agtgctgatatcatttctgaacctgtcgctttgggaagtattcaagtacctaacgatggt 
aatcccattattcttttaaatgataagcaaacaattggtggttatacaaaaattgcaacg 
gtaacacaattagatttaagaaaattagcacagatgaagcctggagacattatacagttt 
aaatggataactgttgaagaagcttcaaaaaagcttaaagaatttaatactaaatttgaa 
caattattaaagcgttttgatgagcaaccattgtttaacctaaatcaacttagacatact 

50 tetaataaaategcagaaataattaaggaggatagataa 

Sequence 800 

VAQSYSTHVRSGMGGFKGRALKKYDVIATQVNHNYKTNLGKTIDFSSIPDNNYIHVIEGP 
QINEFDEETIAKFVNSDFKISDQSDRMGYRLKGNTVPPKNSADIISEPVALGSIQVPNDG 
55 NPIILLNDKQTIGGYTKIATVTQLDLRKLAQMKPGDIIQFKWITVEEASKKLKEFNTKFE 
QLLKRFDEQPLFNLNQLRHTSNKIAEIIKEDR* 

Sequence 801 

Contig_0517_pos_6667_6227 / 
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putative peptide of unknown function 

atggatattaaaaaaattgaggaagtcattaaattggtaaaagctaatgatgtaaaaaaa 
tttaagtataaggactctcataatgaaatagaacttgattt tactaatggagcatctcaa 
caacattcgcaacaatcatctcaagatattcaacaagagaatattaaatccttagatgaa 
5 aagcaagagtccatatcaaatgaccagcaagagattaaatctcctatggttggaacattt 
ttcttacaagatagtaaagaactaacagaacctaagattaaagttggcgatactgtaact 
gaaggagatattatcggttacattgaagctatgaaagttatgaatgaagtaactacggat 
gttactggtgaggtcactgaaatattagtagaacatggagacaatgttgaatatgatcag 
ctactagtcagagttaaatag 

10 

Sequence 802 

MDIKKIEEVIKLVKANDVKKFKYKDSHNEIELDFTNGASQQHSQQSSQDIQQENIKSLDE 
KQESISNDQQEIKSPMVGTFFLQDSKELTEPKIKVGDTVTEGDIIGYIEAMKVMNEVTTD 
VTGEVTE I LVEHGDNVEYDQLLVRVK* 

15 

Sequence 803 

Cont ig_0 5 1 7_pos_62 1 6_4 855, 

is similar to {with p-value 0.0e+00) 

>sp:sp|P4 97 87|ACCC_BACSU BIOTIN CARBOXYLASE (EC 6.3.4.14) (A 
20 SUBUNIT OF ACETYL- COA CARBOXYLASE (EC 6.4.1.2)) (ACC) . >gp : 
gp IU36245 | BSU36245_2 Bacillus subtilis biotin carboxyl carri 
er protein (accB) and biotin carboxylase (accC) genes, compl 
ete cds. NID: gl055244. 

atgatgtatcgatgtttgattgcaaatagaggcgaaatagcagtaagaattataagagct 

25 tgtagagagcttaacatagagacagttgccatttatgcaaaaggtgatgaaaatagctta 
catgtaagtttagccgatcaagcaatatgcataggtgaagcaaatccattagacagttat 
ttaaatattgatcgtattatatctgctgcaaaagttacagaatcaaacgtaattcaccct 
ggctatggtttcttatcggaatctacgaattttgcgaaagccgttgaagacaatcatata 
cattttattggacctagtaagacaactatggaaatgatgggggataaaattactgccaga 

30 caaactgttaaacaagcaggagtacctgttataccaggttctaatgatgctgttcaaagt 
gtagatgaaattaaattattatccaaagaaataggatttccagttgtactaaaagcagct 
agtggtggtggtgggaaaggcatcagaattgttaaagaagcatctcatttggatcaggct 
ttgaaagaagctaaaagtgaaggacaaaaatattttaatgatgatcgagtgtatgtagag 
gcgttcataccagtagcaaaacatgtagaagtgcagattatcggagacggtaaaaataac 

35 tatgttcacttaggtgaacgcgattgttctgttcaacgaaagaatcaaaaattaatagaa 
gaagcgccttgtgctgcattaactgaagaaagaagaacaagaatatgtggcgacgcagtt 
aaagtagctcaagcttcaagatatcgtagtgctggaacaatagaatttttagttacagaa 
gatgcacattattttattgaaatgaatgctcgtattcaagttgaacatacagttacagaa 
atgcgtgctgatagagacctattacaagctcagttatatttattaacacacggtgaatta 

40 ccattcactcagaaagatattttatttaatggtcatgtaattgaggcgcgtataaatgct 
gaaaatcctgaaaaaaactttttacccactccaggaaaagttaataaattacacttacca 
caaggatttaatatacgtgtagattctttactttacacaggttatcaggtttctccttat 
tatgattcacttgtagctaaagtgattgtaaaggattctaatagacaaactgctattaat 
aaattaaaagttgcgttagatgaaatggtcatcgaaggttttactactacagctgacttt 

45 ttatatgcggttttaaattatccaatatatgcaaaaggcgatgccagtaaagtagatata 
aaatttcttgaaaaacatcaaatcattaaagaggtgaaatga 

Sequence 804 

MMYRCLIANRGEIAVRIIRACRELNIETVAIYAKGDENSLHVSLADQAICIGEANPLDSY 
50 LNI DRI ISAAKVTESNVI HPGYGFLSESTNFAKAVEDNHIHFIGPSKTTMEMMGDKITAR 
QTVKQAGVPVIPGSNDAVQSVDEIKLLSKEIGFPVVLKAASGGGGKGIRIVKEASHLDQA 
LKEAKSEGQKYFNDDRVYVEAFIPVAKHVEVQI IGDGKNNYVHLGERDCSVQRKNQKLIE 
EAPCAALTEERRTRICGDAVKVAQASRYRSAGTIEFLVTEDAHYFIEMNARIQVEHTVTE 
MRADRDLLQAQLYLLTHGELPFTQKDILFNGHVIEARINAENPEKNFLPTPGKVNKLHLP 
55 QGFNIRVDSLLYTGYQVSPYYDSLVAKVIVKDSNRQTAINKLKVALDEMVIEGFTTTADF 
LYAVLNYPIYAKGDASKVDIKFLEKHQIIKEVK* 

Sequence 805 

Cont ig_0517_pos_4 088_284 4 , 
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is similar to (with p-value 2.0e-62) 

>pir:pir IG64138 IG64138 branched chain aa transport system II 
carrier protein (braB) homolog - Haemophilus influenzae (st 
rain Rd KW20) >gp: gp I U32845 I U32845_9 Haemophilus influenzae 
5 Rd section 160 of 163 of the complete genome. NID: g3212236. 
atgggggaaaatacaaaacaagatttcaatcaaaaaggacaaaattttaaattcacaaaa 
aaacatagacgattattatatggttcagtttttttaatggctacatcagctattggtcca 
gcatttctgactcaaactgcagtgtttactgcacaattttatgctagttttgcatttgca 
atattaatttctattattatagatataggcgctcaaataaatatttggagaatattagtg 

10 gtaactggattacgtggacaagaaatatctaataaagtattacctggacttggtactatt 
atctccatactaattgcatttggtggtctcgcatttaacataggtaatattgctggtgca 
ggtttaggtttaaatgcaatgtttggtcttgatgtaaaatggggtgctgcaataacagct 
atttttgcgatacttatctttgttagtagaagtggtcagaaaataatggatgttattagt 
atgattctaggtatcgtaatgattttagtagtcgcttatgtcatggttgtttcaaatccc 

15 ccttatggagatgcattagtacatacatttgcacctgaacatcctttcaaacttatatta 
cctataattacattagttggtggtacagtagggggttatattacttttgcaggtgcacat 
agaattctagattctggtataaaaggtaagtcataccttcctttcgtaaatcgatctgct 
gtagcaggtattttaacaactggtgtcatgcgcaccttattgtttttagctgtactaggt 
gttgttgtaactggcgttacgcttagttcagaaaatccaccagcatcagttttccaacat 

20 gcattaggtcctataggtaaaaatatttttggcgtagtaatatttgcagcagcaatgtcc 
tcagtaattggttctgcatatacaagcgcaacatttttaaaaacactacacaaatcgtta 
ctcaataaaaataatcttatcgttattacatttattgtaatttcaacttttgttttctta 
tttattggtaaaccggtgagtttacttataatagctggtgcgattaatggttggattcta 
ccaatcacattaggtgcaattctcattgcaagtaggaaaaaatctatcgttggtaattac 

25 caacacccaacatggatgcttgtt tttggtattatagccgtaattgtcacaataatgact 
ggtatcttttcattacaagatttagcaagtctttggaaaggttaa 

Sequence 806 

MGENTKQDFNQKGQNFKFTKKHRRLLYGSVFLMATSAIGPAFLTQTAVFTAQFYASFAFA 
30 ILISI IIDIGAQINIWRILVVTGLRGQEISNKVLPGLGTIISILIAFGGLAFNIGNIAGA 
GLGLN7VMFGLDVKWGAAITAIFAILIFVSRSGQKIMDVISMILGIVMILVVAYVMVVSNP 
PYGDALVHTFAPEHPFKLILPIITLVGGTVGGYITFAGAHRILDSGIKGKSYLPFVNRSA 
VAGILTTGVMRTLLFLAVLGVVVTGVTLSSENPPASVFQHALGPIGKNI FGVVIFAAAMS 
SVIGSAYTSATFLKTLHKSLLNKNNLIVITFIVISTFVFLFIGKPVSLLIIAGAINGWIL 
35 PITLGAILIASRKKSIVGNYQHPTWMLVFGIIAVIVTIMTGIFSLQDLASLWKG* 

Sequence 807 

Contig_0517_pos_1915_1412, 

is similar to (with p-value 4.0e-46) 
40 >sp:sp| P54 4 52I YQEG_BACSU HYPOTHETICAL 20.1 KD PROTEIN IN NUC 

B-AROD INTERGENIC REGION . >gp: gp | D84432 | BACJH642_91 Bacillus 
subtilis DNA, 283 Kb region containing skin element. NID: g 

2627063. >gp:gp|Z99117|BSUB0014_48 Bacillus subtilis complet 

e genome (section 14 of 21): from 2599451 to 2812870. NID: g 
45 2634966. 

atgccaaatgcatatgtgaaatcaatatttgaaattgatatagaaaaacttgccgatagt 
ggtgttaaaggtatcataactgatttagataatacacttgttggttgggatgttaaagaa 
cctactaagggtgttaaatcatggtttgctaaggctaaagatttaggaataactgtcaca 
attgtgtcaaataataataaaagtcgagtatcaagtttctcaagtaatttaggtgtagat 
50 tatatattcaaagcacgtaaaccgatggggaaagcctttaagatggctattaaaaaaatg 
aaaattcaaccgagagaaaccgttgttgtaggagatcaaatgcttactgatgtgtttggt 
ggcaattgtaatggtttatatacaattatggtagtacctgttaaacggactgatggatta 
attacaaagtttaatcgattaattgaaagacgattattaaatcattttagaaaaaaaggt 
tatat taaatgggaggaaaattga 

55 

Sequence 808 

MPNAYVKSIFEIDIEKLADSGVKGI ITDLDNTLVGWDVKEPTKGVKSWFAKAKDLGITVT 
I VSNNNKSRVSSFSSNLGVDYI FKARKPMGKAFKMAIKKMKIQPRETVWGDQMLTDVFG 
GNCNGLYTIMVVPVKRT DGLITKFNRLIERRLLNHFRKKGYIKWEEN* 
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Sequence 809 

Con t ig_0 5 1 7_pos_8 0 5_3 1 1 , 
is similar to (with p-value 6.0e-39) 
5 >sp: sp| P54 4 53 | YQEH_BACSU HYPOTHETICAL 41.0 KD PROTEIN IN NUC 
B-AROD INTERGENIC REGION. >gp: gp | D84432 | BACJH642_92 Bacillus 
subtilis DNA, 283 Kb region containing skin element. NID: g 
2627063. >gp:gpl Z99117 |BSUB0014_47 Bacillus subtilis complet 
e genome (section 14 of 21): from 2599451 to 2812870. NID: g 
10 2634966. 

atgatagatattccattagacgaaaaatcatttatgtttgatacaccaggtatcattcaa 
tcacatcaaatgacaaattatgtatatgaaaatgagttgaaaatcattatacctaaaaat 
gaaataaagcaacgtgtgtatcaacttaatgaaaaacagacattatttttcggaggat tg 
gcacgcattgattatgtatctggtggtaaaagaccacttgtttgtttcttttcaaatgat 
15 ttaaatattcatagaactaaaaccgagaaagctaatgatttatggaaatcccaattaggc 
gcattgctttcaccgcctcaagatgcacaacaatttaatcttaatgatgtaaaagcagta 
agactggaaactggtaaaactaaacgtgacatcatgatatctggtttaggattcataact 
attgatgctggtgcaaaagtgatagttcgtgttccaaaacatgtagatgttattttaaga 
aattcaattctttaa 

20 

Sequence 810 

MIDIPLDEKSFMFDTPGI IQSHQMTNYVYENELKI IIPKNEIKQRVYQLNEKQTLFFGGL 
ARI DYVSGGKRPLVCFFSNDLN I HRTKTEKANDLWKSQLGALLS PPQDAQQFNLNDVKAV 
RLETGKTKRDIMISGLGFITIDAGAKVIVRVPKHVDVILRNSIL* 

25 

Sequence 811 

Cont ig_0 5 1 7_pos_0_30 4 , 

putative peptide of unknown function 

gtgataaaagtgaaatttgcagtaattggaaaccccatttctcattcattatcgccattg 
30 atgcatcatgctaattttcaatctttaaatttggaaaacacgtatgaagcgataaatgta 
ccagttaatcaatttcaagacattaaaaaaataatttcagaaaagagtattgatggattc 
aatgttactattccacataaagaacgtattattccgtacctagatgatattaatgaacaa 
gcgaaatctgttggggcggtaaatacagttttagttaaagatggtaagtggattggttat 
aata 

35 

Sequence 812 

VIKVKFAVIGNPISHSLSPLMHHANFQSLNLENTYEAINVPVNQFQDIKKI ISEKSIDGF 
NVTI PHKERI I PYLDDINEQAKSVGAVNTVLVKDGKWIGYNX 

40 Sequence 813 

Cont ig_05 1 8_pos_2 6 8 2_1 9 4 2 , 

is similar to (with p-value 1.0e-70) 

>sp:sp|P32816|GLDA_BACST GLYCEROL DEHYDROGENASE (EC 1.1.1.6) 
(GLDH) . >pir:pir IJQ1474 IJQ1474 glycerol dehydrogenase (EC 1 
45 .1.1.6) - Bacillus stearothermophilus >gp:gp | M6528 9 I BACGLDA_ 
2 Bacillus stearothermophilus glycerol dehydrogenase (propos 
ed gld) gene, complete cds . NID: gl42976. 

atggatgcaccaacagcagcagtatctgttatttataacgaagatggatcatttagtggt 
tatgaattctaccctaaaaaccctgatacagttatcgtagattctgaaattgttgcacaa 

50 gcacctgtacgtttatttgcatcaggtatgagtgatggtttagcaacattaatcgaagtt 
gaatctacacttcgtagacaagggcaaaacatgttccatggcaaacctacattagcaagt 
ttagcaatcgctcaaaaatgtgaagaggttatttttgaatatggttacagtgcttatact 
tctgtagaaaaacatatcgtgacaccacaagtagatgctgtgattgaagccaatacatta 
ctttcaggtttaggatttgaaaacggcggattagcaggtgcacacgcaattcataatgga 

55 ttcacagctttagaaggggatatccaccacttaactcatggtgaaaaagtggcatacggt 
attttagtacaattagtacttgaaaatgcgccaactgaaaaattcatgaaatacaaaaca 
ttcttcgataatatcaatatgccaacaacattagaaggtcttcacattgaaaacacaagt 
tatgaagaattagttcaagtaggtgaacgtgcattaacaccaaatgatacgtttgctaac 
ttaagtgataaaatcactgctgatgaaatcgcagacgcaattttaactgttaatgattta 
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tctaaaagtcagttcaactaa 
Sequence 814 

MDAPTAAVSVI YNEDGSFSGYEFYPKNPDTVIVDSEIVAQAPVRLFASGMSDGLATLIEV 
5 ESTLRRQGQNMFHGKPTLASLAIAQKCEEVIFEYGYSAYTSVEKHIVTPQVDAVIEANTL 
LSGLGFENGGLAGAHAIHNGFTALEGDIHHLTHGEKVAYGILVQLVLENAPTEKFMKYKT 
FFDNINMPTTLEGLHIENTSYEELVQVGERALTPNDTFANLSDKITADEIADAILTVNDL 
SKSQFN* 

10 Sequence 815 

Contig_0518_pos_909_334 , 

putative peptide of unknown function 

atgaatgtagcagatatcaaagcacgcttattagatttagaaaatacttttaaagaaaaa 
gaaagtgaactgactgatttagacagagctatcggtgatggagatcatggtgtaaatatg 

15 gtcagaggtttcgaacacttaaaagaaaaaatagatgatcaaagtatgcaagcgctattt 
aaatcaacaggtatgacattaatgtctaacgtaggtggtgcttctggaccattatacggg 
tttggttttatcaaaatggcgagtgcagtgaatgatgaaattgatcatgataatcttaaa 
gaggtacttaaagcgtttgctgatggcattcaacaacgtggtaaagtcgaattaaatgaa 
aaaacgatgtatgatgttatcgaacgtgcgagagaagctgttgaaaaaaatgaaacagta 

20 gatctagataaactacaatcatttgctaatgaaaccaaagatatggtagctactaaaggc 
cgtgcatcatattttaacgaagcttcaaaaggttatattgatcctggtgcacaaagtagt 
gtttatattcttaatgcaattataggaggagagtaa 

Sequence 816 

25 MNVADIKARLLDLENTFKEKESELTDLDRAIGDGDHGVNMVRGFEHLKEKIDDQSMQALF 
KSTGMTLMSNVGGASGPLYGFGFIKMASAVNDEIDHDNLKEVLKAFADGIQQRGKVELNE 
KTMYDVIERAREAVEKNETVDLDKLQSFANETKDMVATKGRASYFNEASKGYIDPGAQSS 
VYILNAI IGGE* 

30 Sequence 817 

Cont ig 0 5 1 8_pos_0_3 30 , 

putative peptide of unknown function 

atgacatctatagtagtagtaagtcatagtcataaaatcgcagaaggtgttaaacaatta 
atcaatcaaatgactgacggtggtgttgaccttattgccgttggtggcttaagtgacgat 
35 gaaatcggtacatcatttgatcaaatcgtctctgtaattaatggacttgaaaatgatgcg 
ctttgtttctatgacatcggttcagcaggtatgaatttagacacagctttagaaatgtac 
gaaggtgaccacaaaattgttaaaatggaagcgccaatcgttgaaggaagctttattgca 
agtgtaggaattaaatcaaatatgagtatA 

40 Sequence 818 

MTSI WVSHSHKIAEGVKQLINQMTDGGVDLIAVGGLSDDEIGTSFDQIVSVINGLENDA 
LCFYDIGSAGMNLDTALEMYEGDHKIVKMEAPIVEGSFIASVGIKSNMSI 

Sequence 819 
45 Cont ig_05 1 9_pos_4 8 3 4_52 0 2 , 

is similar to (with p-value 8.0e-23) 

>gp:gp|AF026147 |AF026147_6 Bacillus subtilis YojA (yojA) , Yo 
jB (yojB), YojC (yojC), YojD (yojD), Yo j E (yojE), Yo j F (yojF 
), YojG (yojG) f YojH (yojH), Yoj I (yojl), YojJ (yoj J) , Yo j K 
50 (yojK), Yoj L (yojL), YojM (yojM) , YojN (yojN), and YojO (yoj 
0) genes, complete cds; and OdhA (odhA) gene, partial cds . N 
ID: g3169316. >gp: gp | Z99114 I BSUBOOll^llO Bacillus subtilis c 
omplete genome (section 11 of 21): from 2000171 to 2207900. 
' NID: g2634230. 

55 gtgaacgtgttggaaccaattaaagaacaagaagtgctagatttattaacttcttactca 
aatcagcctgtttacctacacgttgaaacaacaaatggtgcttatgcaaatcatttcgat 
caacgcgtatttaacgctggaacatttttaagaaatattgtcgtgacttttgaacatgca 
caacttaaaggcggcgacaaagatccatatcgtgtaggtcttaaattaaaagatggtggc 
tgggtttacgtgcaaggacttacgcactatgaagttaatgagaataacgaatttttaatt 
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gcaggttttaattatgaaggacaattggctgctacaatagaaataagtaaacagccattt 
actatataa 

Sequence 820 

5 TOVLEPIKEQEVLDLLTSYSNQPVYLHVETTNGAYANHFDQRVFNAGTFLRNIVVTFEHA 
QLKGGDKDPYRVGLKLKDGGWVYVQGLTHYEVNENNEFLIAGFNYEGQLAATIEISKQPF 
TI* 

Sequence 821 
10 Contig_0519_pos_5218_5883, 

is similar to (with p-value 8.0e-26) 

>gp:gp| AF026147 | AF026147_7 Bacillus subtilis Yoj A (yojA), Yo 
jB (yojB), YojC (yojC), YojD (yojD), YojE (yojE) , Yoj F (yojF 
), YojG (yojG), Yo jH (yojH) , Yojl (yojl), YojJ (yoj J), YojK 
15 (yojK), YojL (yojL), YojM (yojM), YojN (yojN), and YojO {yoj 
0) genes, complete cds; and OdhA (odhA) gene, partial cds . N 
ID: g3169316. >gp:gp| Z99114 J BSUB001 1_109 Bacillus subtilis c 
omplete genome (section 11 of 21): from 2000171 to 2207900. 
NID: g2634230. 

20 atgactgatgaaagacacgtacttgtgattttcccccatcctgatgatgaaactttttcg 
tctgctggaactatcgcaagttatattgaaaaaggtattcccgtcacatatgcatgtctt 
accctaggacaaatgggacgtaatctaggtaaccctccttttgcaacaagagaatcttta 
ccatttatacgtgaacgtgagttagaagaagcatgcaaagcaattgggattacagattta 
aggaaaatggggttaagagataaaactgttgaatttgaaccttacgatcaaatggatcaa 

25 atgattcaatcacttattgacgaaacaaatccatcattaattatttcgttctatcctaaa 
tttgcagttcaccctgatcacgaggcaactgcagaagctgtagtacgtacagttggacgc 
atgcatgaatcagatcgaccccgtcttacacttgtagcgtttagcaatgatgcatcagaa 
attcttggagaacctgatattcaaaatgacatatctcaatatagtgatataaaacttaaa 
gcttttgaagcacatgcttcacaaacaggaccatttttaaaacaacttgctagtcccgaa 

30 atagatggtcaagcacaaagtttcttaaaaatagagccatt ttggacatatcactt tgaa 
tcttaa 

Sequence 822 

MTDERHVLVI FPHPDDETFSSAGTI ASYIEKGI PVTYACLTLGQMGRNLGNPPFATRESL 
35 PFIRERELEEACKAIGITDLRKMGLRDKTVEFEPYDQMDQMIQSLIDETNPSLIISFYPK 
FAVHPDHEATAEAVVRTVGRMHESDRPRLTLVAFSNDASEILGEPDIQNDISQYSDIKLK 
AFEAH ASQTG P FLKQLAS PE I DGQAQS FLKI EP FWT YH FES * 

Sequence 823 
40 Contig_0519_pos_3532_3086, 

is similar to (with p 1 value 1.0e-29) 

>sp:sp|P424 05|YCKG_BACSU HYPOTHETICAL 19.0 KD PROTEIN IN TLP 
C-SRFAA INTERGENIC REGION (ORF10) . >gp: gp | D30762 | BACYCK_10 B 
acillus subtilis DNA around 28 degrees region of chromosome 

45 containing yckA-H genes. NID: g710627. >gp : gp | D504 53 | D504 53_ 
49 Bacillus subtilis DNA for 25-36 degree region containing 
the amyE-srfA region, complete cds. NID: gl805369. 
atggatgcagcagattacgaagtgagccaagcagtaaaatatggtgcagatattgttaca 
attttaggtgttgctgaagatgcttcaattaaagcagcagttgaagaagcgcataaacat 

50 ggaaaagcattgcttgttgatatgatagcagtgcaaaacttagaacaacgtgctaaagaa 
ctagatgagatgggtgcagactatatcgcagttcatacaggttacgacttacaagctgaa 
ggaaaatctccattagacagcttgcgtacagttaaatctgttatcaaaaactctaaggtt 
gcagtagcaggtggtattaaaccagatactatcaaagatattgttgctgaagatccagat 
ttagttattgttggtggcggtattgcgaatgctgacgatcctgtagaagcagcaaaacaa 

55 tgtagagcagctattgaaggtaaataa 

Sequence 824 

MDAADYEVSQAVKYGADIVTILGVAEDASIKAAVEEAHKHGKALLVDMIAVQNLEQRAKE 
LDEMGADYIAVHTGYDLQAEGKSPLDSLRTVKSVIKNSKVAVAGGIKPDTIKDIVAEDPD 
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LVIVGGGIANADDPVEAAKQCRAAIEGK* 
Sequence 825 

Contig_0519_pos_3084_2536, 
5 is similar to (with p-value 7.0e-35) 

>sp:sp| P42404 I YCKFJ3ACSU HYPOTHETICAL 20.0 KD PROTEIN IN TLP 
C-SRFAA INTERGENIC REGION (ORF9) . >gp : gp | D307 62 I BACYCK_9 Bac 
illus subtilis DNA around 28 degrees region of chromosome co 
ntaining yckA-H genes. NID: g710627. >gp: gp I Z 99105 I BSUB0002_ 

10 174 Bacillus subtilis complete genome (section 2 of 21): fro 
m 194651 to 415810, NID: g2632457. >gp: gp | D504 53 I D50453_48 B 
acillus subtilis DNA for 25-36 degree region containing the 
amyE-srfA region, complete cds . NID: gl805369. 
atgagtgaatttaataattatcgtcttattcttgaagagttagattctactttatctcaa 

15 gtagataatacagagtatgaacgttttgctaatgatgttataggtgcagatcgcatattt 
acagctggtaaaggtcgttcaggttttgttgctaatagttttgcaatgcgcttaaatcaa 
ttaggtaaaaatgcctacgttgtaggtgagtcaacaacacct tcaattaaagaacatgat 
ttgtttattattatttcaggttcaggttctacagaacatttaagattattagctgaaaaa 
gcacaatctgagggtgcaaaaattgtcttattaactacaaatgcggaatcgccaatcggt 

20 aatcttgcagagacggttgttgaattgcctgcaggtactaaacatgatgttgagggttcg 
aaacaaccacttggtagtttatttgaacaggcttcacttatattcttagatagtgttgta 
ttacctttaatggatgcatttcacattagtgaaaaaacaatgcaagagaatcatgctaat 
ttagaataa 

25 Sequence 826 

MSEFNNYRLILEELDSTLSQVDNTEYERFANDVIGADRIFTAGKGRSGFVANSFAMRLNQ 
LGKNAYVVGESTTPSIKEHDLFI I ISGSGSTEHLRLLAEKAQSEGAKIVLLTTNAESPIG 
NLAETVVELPAGTKHDVEGSKQPLGSLFEQASLI FLDSVVLPLMDAFHISEKTMQENHAN 
LE* 

30 

Sequence 827 

Contig_0519_pos_2414_1767, 

putative peptide of unknown function 

atgaattttgatagttatatttttgattttgatggaacgctaattgatacaacaacatgt 
35 cacgtcaaagctacgcaaagcgcttttaaaagattaaatttagatgaacctacagaacaa 
gctattttacatacatatcatttaaatttatataacaattttaaagcgctagcttcacat 
gaactgtctttttatcaaatagaaaaattaatagatgaatacaatcattgttttagcaac 
gatgaaatacatcaatcaaaagaatataccggaataagtgaagcattaaaatttttacat 
aaccaaaagaaaaaaatatttgtagtgtctaataaagaaatactaacaactcaaaagtat 
40 ttagattatctcggattaagccgttttataactgattcattaggtgtctgtattaaaaat 
gaagacaaacttctttgtgaaacgattcaaaatttgatacagaaacatcatttaatgata 
ggtaaaaccgtgtatataggggacacagcacagaatatcaagagtgcgaatcaagctcat 
gtgcaaacatgcgctgtcacatggggagcacaatctgcacacgaattgttgcatgaaaat 
cctcattatattgttaatgatccagaagaatttttaacaattttataa 

45 

Sequence 828 

MNFDSYIFDFDGTLIDTTTCHVKATQSAFKRLNLDEPTEQAILHTYHLNLYNNFKALASH 
ELSFYQIEKLIDEYNHCFSNDEIHQSKEYTGISEALKFLHNQKKKIFWSNKEILTTQKY 
LDYLGLSRFITDSLGVCIKNEDKLLCETIQNLIQKHHLMIGKTVYIGDTAQNIKSANQAH 
50 VQTCAVTWGAQSAHELLHENPHYI VNDPEEFLTIL* 

Sequence 829 

Contig_0520_pos_483_1154, 
is similar to (with p-value 9.0e-88) 
55 >gp:gp|AF022796| AF022796_2 Staphylococcus carnosus molybdenu 
m cof actor biosynthetic gene cluster, complete sequence. NID 
: g3955197. 

atgcctgatttaacgtccttttggatttcttttcgtgttgctttaatcagtacaatgata 
gttactatttttggcattttgatttctaaatggctatacaataaaaaaagatattgggta 
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aatctattagaaagttttatcattttaccaattgtgttaccacctactgtccttggtttt 
atactattaattatattttcaacaagaagtcctgtaggagaattctttactaatatctta 
cacttaccagttgtatttacattgacaggtgcagtgattgcatctgtcattgttagtttt 
ccccttatgtatcaacatacagtgaatggttttcgaagtatagattcaaagatgttaaat 
5 actgcaagaacgatgggagcaagtgaaacaaaaatatttcttaaattggtgttaccatta 
tctaaacgttctattcttgcaggtattatgatgagctttgcaagagcaataggtgaattt 
ggtgctactttgatggttgctggctatatcccagacaaaacaaatacattgcctttagaa 
atttattttttagtggagcaagggaaagaaaatgaagcatggttatgggtgcttgtatta 
gttgcgtttgcggtaactgtcatagcgaccataaatctggttaatcgtgatacgtttagg 
10 gaggttgattaa 

Sequence 830 

MPDLTSFWISFRVALISTMIVTI FGILISKWLYNKKRYWVNLLESFIILPIVLPPTVLGF 
ILLIIFSTRSPVGEFFTNILHLPVVFTLTGAVIASVIVSFPLMYQHTVNGFRSIDSKMLN 
15 TARTMGASETKIFLKLVLPLSKRSILAGIMMSFARAIGEFGATLMVAGYIPDKTNTLPLE 
IYFLVEQGKENEAWLWVLVLVAFAVTVIATINLVNRDTFREVD* 

Sequence 831 

Contig_0520j?os_1155_17 75, 
is similar to (with p-value 5.0e-73) 

>gp:gp| AF022796 |AF022796_3 Staphylococcus carnosus molybdenu 
m cofactor biosynthetic gene cluster, complete sequence. NID 
: g3955197. 

atgctcacaattaaagtgaatggtgttcttaatcagacgaaaattaatataaatataaag 
gatcaacaccctaagatatatgcgatacagggaccatctggaattggaaagacaacaatt 
ttaaatataattgccggtttgaaagctataaattattcatatataaaggttggtaaacgt 
gtattaactgattcacgacaccatttgaatgttaaggttcaacaacgtcgtataggatat 
ctatttcaagattatcaacttttccccaatatgaatgtttataacaacataacgtttatg 
actaaaccttctgaacatatcaatgaacttattcatactctaaaaatagagcatttactt 
gaaaagtatcctgtgaccttatcaggaggtgaagctcagcgcgtcgctttagcaagggcg 
ctaagtacgaaacccgatttgattttgcttgatgagcctttttcaagtttagatgataaa 
acaaaaaacgaaggtatcaaattaattttaaaaatattcgaagcatggcaaattcctatt 
atatttgtaacgcattcaaattatgaagcgcaacaaatggcgcatgagattataacaatt 
gaagattgtatacaaatatag 

Sequence 8 32 

MLTIKVNGVLNQTKININIKDQHPKI YAIQGPSGIGKTTILNIIAGLKAINYSYIKVGKR 
VLTDSRHHLNVKVQQRRIGYLFQDYQLFPNMNVYNNITFMTKPSEHINELIHTLKIEHLL 
EKYPVTLSGGEAQRVALARALSTKPDLILLDEPFSSLDDKTKNEGIKLILKIFEAWQIPI 
I FVTHSNYEAQQMAHEI ITIEDCIQI * 

Sequence 833 

Contig_0520_pos_1851_2852, 
is similar to (with p-value 0.0e+00) 
45 >gp:gp|AF022796 |AF022796_4 Staphylococcus carnosus molybdenu 
m cofactor biosynthetic gene cluster, complete sequence. NID 
: g3955197. 

atgcaagaaagatactcgagacaagtgttgtttaaggaaattggtttaaaaggtcaaagt 
ctacttgagaaaaaacatgtgcttatagtaggtatgggtgcgttaggaacacacttagct 

50 gagggattagtgagagcgggaataaataagttaaccattgttgatagagattacattgaa 
t ttagcaatttacaacgacaaacgctctttatagagcgtgatgcagaagatgtac tacca 
aaagtaatcgctgcacagaaagttttaaaagagatacgtaaagatgtagagatagatgct 
tatattgagcatgttaattacaatttcttagagcaacatggcatgcatgtcgatatcata 
ttagatgcaactgataattttgatacacgtcagttaattaatgactttgcttataaacat 

55 cagattccttggatttatggtggtgttgtacaaagtacatatgttcaggcaacgtttatt 
cctggtgaaacaccgtgttttaattgcttaatgcctcaattaccatctattaatttaaca 
tgtgatacggttggagttattcaaccagctgtaacaatgacaaccagtttacaactcgtt 
gatgcattgaagttgctgactggtaataaggttaataaacacttcacttacggggatatt 
tggacaggagatcattatacatttggttttagtcgtatgcaaaatgaagattgtaaaact 
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tgtggtaatgctccaacatatccacaccttaatcaacatcaacaagattatgcgacctta 
tgtggaagagacactgttcaatataaaaatgctgatatttctcaggaaatattactatca 
tttctcgagcgaaatcatattcaatatcgcacgaatttatatatgacaatgtttaggttt 
agagaacatcgaattgttgcattttctggaggtagatttttgatacatggaacgacagaa 
5 cctaaaaaagcaattcaattaatgcatcaactatttggttaa 

Sequence 834 

MQERYSRQVLFKEIGLKGQSLLEKKHVLIVGMGALGTHLAEGLVRAGINKLTIVDRDYIE 
FSNLQRQTLFIERDAEDVLPKVIAAQKVLKEIRKDVEIDAYIEHVNYNFLEQHGMHVDII 
10 LDATDNFDTRQLINDFAYKHQIPWIYGGWQSTYVQATFIPGETPCFNCLMPQLPSINLT 
CDTVGVIQPAVTMTTSLQLVDALKLLTGNKVNKHFTYGDIWTGDHYTFGFSRMQNEDCKT 
CGNAPTYPHLNQHQQDYATLCGRDTVQYKNADISQEILLSFLER^3HIQYRTNLYMTMFRF 
REKRIVAFSGGRFLIHGTTEPKKAIQLMHQLFG* 

15 Sequence 835 

Contig_0520_pos_4 068_5273, 

is similar to (with p-value 0.0e+00} 

>gp:gp|AF022796|AF022796_7 Staphylococcus carnosus molybdenu 
m cof actor biosynthetic gene cluster, complete sequence. NID 

20 : g3955197. 

atgaaacaacatgttgaagtgaagaatatcaatattaatttagatgaaagtttaggacat 
attcttgctgaagatattgttgcgacctatgatataccaagatttaataaatcaccctac 
gatgggtttgcaattagaagtgaagattcacaaggtgcaagtggcgaaaaccgtattgaa 
tttgaagtaatagatcatatcggtgcaggttcagtttcagaaaaaacaattgataaaaac 

25 caagcaattcgaataatgactggtgctcaaattccttctggagctgatgccgtagtaatg 
tttgaacaaactattgaatctgaaacaacttttacaattagaaaatcctttaaacattta 
gaaaatatttcgctacaaggtgaagaaataaaagctggtgatattgtactacataaaggt 
atgcgtattaactcaggtgtgatagcagtcttagctacatacggttatactaaagtgcga 
gtggctcgaaaaccaactgttgcagtaattgctacaggtagtgaattgcttgaagtagaa 

30 gatgagcttgaaccaggaaagatacgaaattcaaacggaccaatgattaaagcattagct 
aaacaatttggaatacaagttggaatgtataaagttcagcatgataatctcgaaaagagt 
attgaggttgtaaaaaaagctttatcagagcatgatttagtaattactaccggaggtgtg 
tcggtaggagattttgattacttaccagaaatatacaagtccatccaagcacagatacta 
tttaacaaagtggctcaaagaccaggtagtgttactacggttgcatttgcagatggtaaa 

35 tatttatttggcttatctggaaacccttcagcctgctatacaggatttgaattatatgtc 
aaacctgctgtaaataagctcatgggagctaaagcttgttatccgcaaataatcaaagct 
acacttatggaagattttaataaagctaacccatttacacgattgattcgtgctaaggca 
acattaacaaaagctggaatgacagtaataccatctggatttaataaatcaggtgcagtt 
gtagccattgcgcacgctaatgctatgattatgcttcctggtggcacacgtggatttaaa 

40 gcgggcaacattgttgatgtgattttgaccgaatctaatagttttgaagaggaattgata 
ctatga 

Sequence 836 

MKQHVEVKNININLDESLGHILAEDIVATYDIPRFNKSPYDGFAIRSEDSQGASGENRIE 
45 FEVI DH I GAGS VSEKT I DKNQA I RIMTGAQI PSGADAVVMFEQTI ES ETT FT I RKS FKHL 
ENISLQGEEIKAGDIVLHKGMRINSGVIAVLATYGYTKVRVARKPTVAVIATGSELLEVE 
DELEPGKIRNSNGPMIKALAKQFGIQVGMYKVQHDNLEKSIEVVKKALSEHDLVITTGGV 
SVGDFDYLPEI YKSIQAQILFNKVAQRPGSVTTVAFADGKYLFGLSGNPSACYTGFELYV 
KPAVNKLMGAKACYPQIIKATLMEDFNKANPFTRLIRAKATLTKAGMTVIPSGFNKSGAV 
50 VAIAHANAMIMLPGGTRGFKAGNI VDVILTESNSFEEELIL* 

Sequence 837 

Contig_0520j?os_5291_574 6, 
is similar to (with p-value 4.0e-40} 
55 >gp:gp| AF022796 I AF0227 96_8 Staphylococcus carnosus molybdenu 
m cof actor biosynthetic gene cluster, complete sequence. NID 
: g3955197. 

atgaaaaattcagggaaaaccacattgatgaaccatgctatatcatttttaaaagaacga 
ggctattcagtagtaacaattaaacatcacgggcatattggtgaagaaattgaattacag 
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tcatctgatgttgaccacatgaaacatttcgctgcgggcgcagaccaaagtattgttcag 
gggcatcatttacagcaaacagtgacacgtaaaaagaaacaatcgcttagagaaataata 
gaaaattctgttacaattgattgtagtatcattttagttgagggctttaaagaagcaaat 
tatgataaaattatcgtttataaaaataatgatgaattaagaagtctacaaggactttct 
5 cacgtcatagggaaaatagaaaccaatcatccacgtgcaagtaatcaacttgagcactta 
ctcaataaattaattaaggataagggaatgaattaa 

Sequence 838 

MKNSGKTTLMNHAISFLKERGYSVVTIKHHGHIGEEIELQSSDVDHMKHFAAGADQSIVQ 
10 GHHLQQTVTRKKKQSLREI IENSVTI DCSI ILVEGFKEANYDKI IVYKNNDELRSLQGLS 
HVIGKIETNHPRASNQLEHLLNKLIKDKGMN* 

Sequence 839 

Contig_0520_pos_5747_6199, 
15 is similar to (with p-value 6.0e-70) 

>gp:gpl AF022796 I AF022796__9 Staphylococcus carnosus molybdenu 
m cofactor biosynthetic gene cluster, complete sequence- NID 
: g3955197. 

atgaagcaatttgaaatcgtgactcaacctattgaaacagaacaatatagggattttacg 
20 attaacgaacgtcaaggtgccgtagtcgtatttactggtcacgtaagagagtggactaaa 
ggtattcgtacacaacatttagagtatgaagcttatataccaatggctgagaaaaaatta 
gctcaaattggtaaagaaattgaagaaaagtggcctggaacaataacaacaattgtacat 
cgaattggtccattacaaatatcagatattgcagttttaattgcagtatcttcaccgcat 
agaaaagcagcatatgcagcgaatgaatacgccatcgagcgcataaaggaaattgttcca 
25 atttggaaaaaggaaatttgggaagatggtgctgaatggcaaggtcatcaaaagggaaca 
tataatgaagcaaaaaaggggaaagcaagatga 

Sequence 840 

MKQFEIVTQPIETEQYRDFTINERQGAWVFTGHVREWTKGIRTQHLEYEAYIPMAEKKL 
30 AQIGKEIEEKWPGTITTIVHRIGPLQISDIAVLIAVSSPHRKAAYAANEYAIERIKEIVP 
IWKKEIWEDGAEWQGHQKGTYNEAKKGKAR* 

Sequence 841 

Contig_0520_pos_6555_704 0, 
35 is similar to (with p-value 8.0e-39) 

>gp:gp|AF022796|AF022796_ll Staphylococcus carnosus molybden 
urn cofactor biosynthetic gene cluster, complete sequence. NI 
D: g3955197. 

atgtttaatcgcattatcattagcactaattcccaattagcttctcagtttgaatatgaa 
40 tatgtgattattgatgacgaacatcatcaaaataaagggccgctaacaggaatttactca 
gtgatgaaacaatacatggatgaagaattgtttttcattgtatctgttgatacaccaatg 
attacaagtaaagcagtgaatgggttatatcatttcatggtatcaaacttaattgaatca 
cgtttagatattgtcgcatttaaagaaggagaaatatgtataccgacgattggtttttat 
acactttcgacgtttccttttattgaaaaagctttaaattcaaatcatttaagtctgaag 
45 catgtctttaaacaattatcgacagattggttagatgttactgaaattgactcgccttat 
tattggtataagaatattaattttcagcatgatttggactctttaaaaatgcagataaat 
gaataa 

Sequence 842 

50 MFNRI IISTNSQLASQFEYEYVIIDDEHHQNKGPLTGIYSVMKQYMDEELFFIVSVDTPM 
ITSKAVNGLYHFMVSNLIESRLDIVAFKEGEICIPTIGFYTLSTFPFIEKALNSNHLSLK 
HVFKQLSTDWLDVTEIDSPYYWYKNINFQHDLDSLKMQINE* 

Sequence 8 43 
55 Con t i g_0 5 2 0_po s__7 0 5 3_0 , 

is similar to (with p-value 6.0e-63) 

>gp:gp| AF022796|AF022796_12 Staphylococcus carnosus molybden 
urn cofactor biosynthetic gene cluster, complete sequence. NI 
D: g3955197. 
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atgaaagaggtaatacaagataaattaggccgtccaatacgggatttaagaatatcggtc 
actgatcgatgtaatttcagatgtgattattgtatgccaaaggaaatctttggagatgat 
tacactttcttacctaagaatgaattgcttacttttgaagaattaacacgaatttcaaag 
atttatgctcaattaggagttaaaaagataagaattacaggaggagagcctctcttacga 
5 cgcaatctttataaacttgtagagcaattaaatctcatagatggtatagaggatattgga 
ttgactactaatggcttgttattaaaaaaacatggaaaaaatttatatcaagctggttta 
cgacgtattaatgtaagtttagatgcgattgaggataacgtttttcaagaaattaacaat 
agaaatattaaagcgtctacaatcttagaacaaattgattatgcagtatcaataggtttt 
gaagttaaagtaaac 

10 

Sequence 844 

MKEVIQDKLGRPIRDLRISVTDRCNFRCDYCMPKEIFGDDYTFLPKNELLTFEELTRISK 
IYAQLGVKKIRITGGEPLLRRNLYKLVEQLNLIDGIEDIGLTTNGLLLKKHGKNLYQAGL 
RRINVSLDAIEDNVFQEINNRNIKASTILEQIDYAVSIGFEVKVN 

15 

Sequence 845 

Contig_0520_pos_6172_58 67, 

is similar to (with p-value 5.0e-48) 

>gp:gp| AF022796I AF022796_9 Staphylococcus carnosus molybdenu 
20 m cof actor biosynthetic gene cluster, complete sequence. NID 
: g3955197. 

atgttcccttttgatgaccttgccattcagcaccatcttcccaaatttcctttttccaaa 
ttggaacaatttcctttatgcgctcgatggcgtattcattcgctgcatatgctgcttttc 
tatgcggtgaagatactgcaattaaaactgcaatatctgatatttgtaatggaccaattc 
25 gatgtacaattgttgttattgttccaggccacttttcttcaatttctttaccaatttgag 
ctaattt tttctcagccattggtatataagcttcatactctaaatgttgtgtacgaatac 
ctttag 

Sequence 846 

30 MFPFDDLAIQHHLPKFPFSKLEQFPLCARWRIHSLHMLLFYAVKILQLKLQYLIFVMDQF 
DVQLLLLFQATFLQFLYQFELI FSQPLVYKLHTLNVVYEYL* 

Sequence 84 7 

Contig_0521_pos_1712_2014, 

35 is similar to (with p-value 1.0e-29) 

>sp:sp|P3754 7IYABF_BACSU HYPOTHETICAL 2 0.7 KD PROTEIN IN MET 
S-KSGA INTERGENIC REGION. >gp : gp | D26185 | BAC180K_104 B. subti 
lis DNA, 180 kilobase region of replication origin. NID: g46 
7326. >gp:gp| Z99104 |BSUB0001_41 Bacillus subtilis complete g 

40 enome (section 1 of 21): from 1 to 213080. NID: g2632267. 

atgtttgcatgttctattcctattttacctcttttacttttggccttttctctatctaca 
tatgcgtgtttaacaccagaaacatgttcccgtatagtatttctaattt tatcacctggg 
aaatctggatctgtcagtacaatcacacctctagtttgctgcgcgtgttgtatcacttct 
aaagtatatgtatcaattgcactaccgtttgtttcaatagtatcacaatctacagcactt 

45 tttactcgttcagtatcatctttaccttcaacaacaataaattcgtttattttcataaat 
taa 

Sequence 848 

MFACSIPILPLLLLAFSLSTYACLTPETCSRIVFLILSPGKSGSVSTITPLVCCACCITS 
50 KVYVSIALPFVSIVSQSTALFTRSVSSLPSTTINSFIFIN* 

Sequence 84 9 

Contig_0521_pos_8707_8264 , 
putative peptide of unknown function 
55 atgtcaatagcaagaccggtagcagaaatgctaagtacctatacaactcaaatagaagca 
gcatcgtcgttaaatgaggaatttgatttagttactttaagaaaatcaatcatcagatgg 
tgtcaattaatattatctaataaagctatggctcttataggtgtcattgagttattaaaa 
caagctaagaatcgtaaattacaattacttactttatcagccgtcaatggctttttcgaa 
gatataatgcatgcaaagatagaaatggataattattatacatttagtgatttaactgaa 
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gaaattgaaaattatgcaaatcaattaacttttaatcaattaatcctcatgtatgatcag 
attactgaagcgcataaaaagttaaatcaaaatgttaatccaacacttgtttttgaacaa 
atagtaataaaaggtgtgatttaa 

5 Sequence 850 

MSIARPVAEMLSTYTTQIEAASSLNEEFDLVTLRKSIIRWCQLILSNKAMALIGVIELLK 
QAKNRKLQLLTLSAVNGFFEDIMHAKIEMDNYYTFSDLTEEIENYANQLTFNQLILMYDQ 
ITEAHKKLNQNVNPTLVFEQIVIKGVI* 

10 Sequence 851 

Contig_0521_pos_8262_74 59, 

is similar to (with p-value 3.0e-88) 

>sp:sp|P37541|YAAT_BACSU HYPOTHETICAL 31.2 KD PROTEIN IN XPA 
C-ABRB INTERGENIC REGION . >gp : gp | D2 61 8 5 I BAC 1 8 0K_9 6 B. subtil 

15 is DNA, 180 kilobase region of replication origin. NID: g4 67 
326. >gp:gp| Z99104 | BSUB000132 Bacillus subtilis complete ge 
nome (section 1 of 21): from 1 to 213080. NID: g2632267. 
atgcccaatgttgtaggtgttcagtttcaaaaagcagggaaattagaatactacgcgccg 
aatcaattagatgtagaggttggtgactgggttgttgtccaatctaaaagaggtatagaa 

20 attggccacgtaaagtttccattacgtgaagttgatgtagaagatgtcacattaccgcta 
aaaaatatcattcgtaaaatgaatgaagatgatcaagaaacatattatcgtaatgaacgc 
gatgccaatgatgcgttagaattatgtaaaaaagtagttaaagatcagcaattagatatg 
cgattagttaattgtgaatatacattagataaatctaaagtgatttttaattttaccgca 
gatgatcgcattgattttcgcaaacttgttaaagttttagctcaaaatctaaagactaga 

25 atagaattacgtcaaattggggtaagagatgaagcgaaattattgggtggtatcggtcct 
tgtggacgttctttatgttgttctacatttttaggagatttcgaacctgtatccattaaa 
atggcgaaagatcagaacctatcattaaatccaactaagatttcaggagcttgtggtaga 
ttgatgtgttgtcttaaatatgaaaatgactactatgaagaggctcgaactcaattacct 
gatgttggagatatgattcaaacaccagatggtcacggaaaagtgataggattaaatatt 

30 ttagatatttctatgcaagttaaaatagagggtctagaacaacctttagaatataaaatg 
gaagagatagaagtattgaattaa 

Sequence 852 

MPNVVGVQFQKAGKLEYYAPNQLDVEVGDWVVVQSKRGIEIGHVKFPLREVDVEDVTLPL 
35 KNIIRKMNEDDQETYYRNERDANDALELCKKVVKDQQLDMRLVNCEYTLDKSKVIFNFTA 
DDRIDFRKLVKVLAQNLKTRIELRQIGVRDEAKLLGGIGPCGRSLCCSTFLGDFEPVSIK 
MAKDQNLSLNPTKISGACGRLMCCLKYENDYYEEARTQLPDVGDMIQTPDGHGKVIGLNI 
LDISMQVKIEGLEQPLEYKMEEIEVLN* 

40 Sequence 853 

Contig_0521_pos_7412_7095, 

putative peptide of unknown function 

atgaaattagaacatcacgttgaacaacttacaaccgacatgtcagaacttaaagattta 
acagtcgaacttgttgaggagaatgttgctttgcaagttgaaaatgaaaatttaaaacga 
45 ttgatgaacaaaactgaagaatcggttgaaactcacttagataaagataattataagcat 
gtaaaaacaccatctccaagtaaagataatttagcaatgt tat atcgtgaaggttt teat 
atttgtaagggtgaattattcgggaaacatcgtcatggtgaagattgcttattatgcctt 
aatgtgttgagtgattaa 

50 Sequence 854 

MKLEHHVEQLTTDMSELKDLTVELVEENVALQVENENLKRLMNKTEESVETHLDKDNYKH 
VKTPSPSKDNLAMLYREGFHICKGELFGKHRHGEDCLLCLNVLSD* 

Sequence 855 
55 Contig_0521_pos_6999_6274, 

is similar to (with p-value 2.0e-54) 

>sp:sp| P3754 3I YABB_BACSU HYPOTHETICAL 28.3 KD PROTEIN IN XPA 
C-ABRB INTERGENIC REGION . >gp : gp | D26185 | BAC180K_98 B. subtil 
is DNA, 180 kilobase region of replication origin. NID: g467 
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326. >gp:gp| Z99104 | BSUB0001_34 Bacillus subtilis complete ge 
nome (section 1 of 21): from 1 to 213080. NID: g2632267. 
atgttagaagataatgaacgcgttgaccatttaataaaagaaggatatgagattatacaa 
aatgatgaagtattctctttttctactgatgccctattgttagggtatttaaccgaagtt 
5 cgcaaaaatgataaagttatggatttatgctccggcaatggtgttattccattattatta 
gctgctaaatcaactcaacctattgaaggaatagaaatacaagagcaactagtgagtatg 
gcacgtcgcagttttaaattgaatgatttgaacgatagactaactatgcaccatatggat 
ttaaaagatgtatatcaaacatttcaacctgctcaatatacattagtgacttgtaatcct 
ccttattttaaaatgaatcaaaatcatcaacatcaaaaagaagcacataaaatagcacgt 

10 cacgaaataatgtgtaatcttaaagattgtattgaagctgcaagacatttacttaaagag 
ggtggtcgttttattatggttcatcgagcggaaaggctaatggatgtcttaaccgaatta 
agacatggtaaaattgagcctaaagcactgacgttagtgtatagtaaacatgataagcct 
gcacaaacaattgttgtggaaggaagaaaaggtggtaaccaaggtttagatatacgtaat 
ccattatacatatataatgaggatggatcatatagcgatgagatgaaaggtgtttattat 

15 ggataa 

Sequence 856 

MLEDNERVDHLIKEGYEI IQNDEVFSFSTDALLLGYLTEVRKNDKVMDLCSGNGVI PLLL 
AAKSTQPIEGIEIQEQLVSMARRSFKLNDLNDRLTMHHMDLKDVYQTFQPAQYTLVTCNP 
20 PYFKMNQNHQHQKEAHKIARHEIMCNLKDCIEAARHLLKEGGRFIMVHRAERLMDVLTEL 
RHGKIEPKALTLVYSKHDKPAQTIVVEGRKGGNQGLDIRNPLYI YNEDGSYSDEMKGVYY 
G* 

Sequence 857 
25 Contig_0521_pos_6031_5192, 

is similar to {with p-value 2.0e-67) 

>sp:sp| P37544 | YABC_BACSU HYPOTHETICAL 33.0 KD PROTEIN IN XPA 
C-ABRB INTERGENIC REGION. >gp : gp | D26185 | BAC180K_99 B. subtil 
is DNA, 180 kilobase region of replication origin. NID: g4 67 

30 326. >gp:gp| Z99104 I BSUB0001_36 Bacillus subtilis complete ge 
nome (section 1 of 21): from 1 to 213080. NID: g2632267. 
atgacaactttatatttagtaggaacaccaattggaaatcttggtgatattacatttcga 
gctatagaaacattaaaaaaagttgatgtgattgcatgtgaagatacacgtgtaacacgg 
aaattgtgtaatcattatgaaatacaaacacctctaaagtcgtatcatgaacataataaa 

35 gaacaacaaactgactatttaatcaagcagttacaaactggcttaaatatagcgttagta 
tcagatgctgggttgccattaattagtgatccaggatatgaattggttgtcgaagcacgt 
aaaaataatataaatatagaaacagtaccaggtcctaatgctgggttgactgcacttatg 
tcaagtggattaccatctttcacatacacatttttaggttttttgccaagaaaagaaaaa 
gaaaaaattgaagtgcttgaggatagaatgtttcaaaatagtactttaatactttatgaa 

40 tcgccttatagggt tact gat act ttgaaagcaatagctaaaatagattcacaaagatgg 
attactgttggtagagagctaacgaagaaatttgaacaagttcttacacttacagttgat 
gatatgttgaaattgattaatcatgacaaattacctcttaaaggtgagtttgtgatactg 
attgaaggtgcattacctaagagtggtgaatcatggtttgaaagctatacggttaaagaa 
catgttgattattatattgaaaccaaacatgttaaacctaaaaaagcaattaaatttgtc 

45 gctacagatcgacatatgaagacgggtgacatatataatatttatcataatattgattaa 



Sequence 858 

MTTLYLVGTPIGNLGDITFRAIETLKKVDVIACEDTRVTRKLCNHYEIQTPLKSYHEHNK 
50 EQQTDYLIKQLQTGLNIALVSDAGLPLISDPGYELVVEARKNNINIETVPGPNAGLTALM 
SSGLPSFTYTFLGFLPRKEKEKIEVLEDRMFQNSTLILYESPYRVTDTLKAIAKIDSQRW 
ITVGRELTKKFEQVLTLTVDDMLKLINHDKLPLKGEFVILIEGALPKSGESWFESYTVKE 
HVDYYIETKHVKPKKAIKFVATDRHMKTGDI YNI YHNID* 

55 Sequence 859 

Contig_0521_pos_4 868_3012, 

is similar to (with p-value 0.0e+00) 

>Sp:sp| P23920|SYM_BACST METHIONYL-TRNA SYNTHETASE (EC 6.1.1. 
10) (METHIONINE— TRNA LIGASE) (METRS) . >pir : pir | SI 6682 | SI 668 



213 



WO 01/34809 



PCT7US00/30782 



2 methionine — tRNA ligase (EC 6.1.1.10) - Bacillus stearothe 
rmophilus >gp : gp | X57925 | BSMETSG_1 B . stearotherraophilus metS 
gene for methionyl-tRNA synthetase. NID: g39988. 
atgcaaggctatgatgttcgttatttaactggcactgatgagcacggtcaaaaaatccaa 
5 gaaaaagctcaaaaagctggcaaaacagaactagaatacttagatgaaatgatttcaggt 
attaaaaacttatggagtaaacttgagatttctaatgatgattttattcgaactacagaa 
gagcgtcataagcaagtcgttgagaaagtgtttgagcgattattaaaacaaggtgacatt 
tatttaggtgaatacgaaggttggtattctgttcctgatgaaacatattatacagagtca 
caacttgttgaccctgtttatgaaaacggcaaaattgtaggtggtaaaagtcctgattct 

10 ggtcacgaagtcgaacttgtaaaagaagaaagctatttcttcaacattaataaatataca 
gaccgct tattagaattttacgatgaaaatccagactttatacaaccaccatctagaaaa 
aatgaaatgattaataactttatcaaaccaggtttagaagatttagcagtatcacgtaca 
tcattcgattggggtgtacgtgtaccatctaatcctaaacatgttgtatacgtgtggatt 
gatgcacttgttaattatatttcttcattaggttatctatctgatgatgaaacattattt 

15 aataaatattggccagcagacatacacttgatggctaaagaaattgtacgtttccactct 
attatatggccaatattgttaatggcgttggatttaccacttcctaaaaaagtttttgca 
cacggttggattttaatgaaagatggtaaaatgagtaaatctaaaggtaatgtcgtagat 
cctaatgtattaattgatcgttatggtcttgatgcgacacgttattacttaatgcgtgaa 
ttaccgtttggttctgatggcgtatttacaccggaagcctttgttgaaagaacaaattac 

20 gatcttgcgaatgatttaggtaatctagtgaatcgtactatctctatgataaacaaatat 
ttccacggcgaattacctgcataccaaggtccaaaacatgaattggatgaaaaaatggaa 
gcgatggcgcttgaaactgttaaatcattcaatgataatatggaaagtttacaattttct 
gttgctttatcaacagtatggaaatttattagtcgtacaaacaaatatattgatgaaact 
caaccttgggttcttgcaaaagatgaaaatcaacgtgagatgcttggtaatgtaatggca 

25 catcttgtcgagaacattcgtttcgctacaatcttattacaaccattcttgacgcatgca 
cctagagagatatttaagcaacttaatattaacaatccggatttacatcaattagatagt 
ctgcaacaatatggtatgt tgtcagaggcaattactgtaactgaaaagccaacaccaatt 
ttcccaagattagacactgaagcagaaattgcttatatcaaagaatcaatgcaaccacct 
aaatcaataaaacagtctgatgaacccggtaaagagcaaattgatatcaaagattttgat 

30 aaagttgaaatcaaagcagcaaccattattgatgcggaaaatgtaaaaaaatcggagaaa 
ctattaaaaataaaagttgaattagataatgaacaacgtcaaatagtatctggtatagct 
aagttttatcgtccggaagacattattggtaaaaaagttgcagttgttactaatttaaaa 
ccagctaaattgatgggacaaaaatccgaaggtatgattttgtcagctgaaaaagatggc 
gtacttaccttgataagcttgcctagcgcaattccaaatggtgcagtaattaaatag 

35 

Sequence 860 

MQGYDVRYLTGTDEHGQKIQEKAQPCAGKTELEYLDEMISGIKNLWSKLEISNDDFIRTTE 
ERHKQVVEEWFERLLKQGDIYLGEYEGWYSVPDETYYTESQLVDPVYENGKIVGGKSPDS 
GHEVELVKEESYFFNINKYTDRLLEFYDENPDFIQPPSRKNEMINNFI KPGLEDLAVSRT 

40 SFDWGVRVPSNPKHVVYVWIDALVNYISSLGYLSDDETLFNKYWPADIHLMAKEIVRFHS 
IIWPILLMALDLPLPKKVFAHGWILMKDGKMSKSKGNVVDPNVLIDRYGLDATRYYLMRE 
LPFGSDGVFTPEAFVERTNYDLANDLGNLVNRTISMINKYFHGELPAYQGPKHELDEKME 
AMALETVKSFNDNMESLQFSVALSTVWKFISRTNKYIDETQPWVLAKDENQREMLGNVMA 
HLVENIRFATILLQPFLTHAPREIFKQLNINNPDLHQLDSLQQYGMLSEAITVTEKPTPI 

45 FPRLDTEAEIAYIKESMQPPKSIKQSDEPGKEQIDIKDFDKVEIKAATIIDAENVKKSEK 
LLKIKVELDNEQRQIVSGIAKFYRPEDI IGKKVAVVTNLKPAKLMGQKSEGMILSAEKDG 
VLTLISLPSAIPNGAVIK* 

Sequence 861 
50 Contig_0S21_pos_2987_2214, 

is similar to (with p-value 3.0e-99) 

>sp:spl P37545I YABD_BACSU HYPOTHETICAL 29.2 KD PROTEIN IN MET 
S-KSGA INTERGENIC REGION. >gp : gp I D2 6185 I BAC180K_102 B. subti 
lis DNA, 180 kilobase region of replication origin. NID: g46 
55 7326. >gp:gp|Z99104 |BSUB0001_39 Bacillus subtilis complete g 
enome (section 1 of 21),: from 1 to 213080. NID: g2632267. 
atgatgttaatcgatacgcatgtacatttaaatgatgaacaatatgatgaggatttaaat 
gaagtgatatctcgtgcgagagaagcaggcgtagatagaatgtttgtagtaggttttgat 
acacctactattgaacgtactatggagctaatagataagtatgactttatctatggtatt 
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atcggttggcatcctgttgatgcaatagattgtactgatgaaagattggaatggatagaa 
agtctttctaaacatcctaaaattattggtattggtgagatggggttagattatcattgg 
gataaatcaccttctgatgtacaaaaagaggtatttaaaaagcaaattgcattagctaaa 
cgtgttcaattacctattattattcataatcgtgaagcgactcaagattgcatagatatt 
5 ttgattgaagaacatgcagaagaagtgggcggaataatgcatagttttagtgcttcacct 
gaaattgctgatgtcgtgattaataaattgaacttctatgtttcgcttggaggacccgtc 
actttcaaaaatgcaaaacaaccaaaagaagttgctaaacacgtaccaatggatcgtttg 
ttatgcgagacagatgccccgtatctatccccgcacccttatagaggtaaacgtaatgaa 
ccagaacgtgttactttagtagcacaacaaattgcagatttgcgtggtatgacttatgaa 
10 gaggtctgtcgccaaacaaccgaaaatgctgaacgtttattcaatttgaattaa 

Sequence 862 

MMLIDTHVHLNDEQYDEDLNEVISRAREAGVDRMFVVGFDTPTIERTMELIDKYDFIYGI 
IGWHPVDAIDCTDERLEWIESLSKHPKI IGIGEMGLDYHWDKSPSDVQKEVFKKQIALAK 
15 RVQLPIIIHNREATQDCIDILIEEHAEEVGGIMHSFSASPEIADVVINKLNFYVSLGGPV 
TFKNAKQPKEVAKHVPMDRLLCETDAPYLSPHPYRGKRNEPERVTLVAQQIADLRGMTYE 
EVCRQTTENAERLFNLN* 

Sequence 863 
20 Contig_0521_pos_2007_1462, 

is similar to (with p-value 2.0e-43) 

>Sp : sp I P3754 7 | YABF_BACSU HYPOTHETICAL 20.7 KD PROTEIN IN MET 
S-KSGA INTERGENIC REGION. >gp : gp | D26185 I BAC180K_104 B. subti 
lis DNA, 180 kilobase region of replication origin. NID: g46 

25 7326. >gp:gp| Z99104 | BSUB0001_4 1 Bacillus subtilis complete g 
enome (section 1 of 21): from 1 to 213080. NID: g2632267. 
atgaaaataaacgaatttattgttgttgaaggtaaagatgatactgaacgagtaaaaagt 
gctgtagattgtgatactattgaaacaaacggtagtgcaattgatacatatactttagaa 
gtgatacaacacgcgcagcaaactagaggtgtgattgtactgacagatccagatttccca 

30 ggtgataaaattagaaatactatacgggaacatgtttctggtgttaaacacgcatatgta 
gatagagaaaaggccaaaagtaaaagaggtaaaataggaatagaacatgcaaacattaaa 
gatattcaagaagcattaatgcatgtaagttcaccacttgaagaagctaaagaaactatt 
gataaaagtgtactcattgatttgggattaattatcggtaaagatgcaagataccgtaga 
aatatcttaggtcgaaaattacacatcggtcactctaatggaaagcaattattaaagaaa 

35 cttaatgcttttggctatactgaagacgatgtcagaaaagcgctatttgaagaagaggag 
aattaa 

Sequence 864 

MKINEFIVVEGKDDTERVKSAVDCDTIETNGSAIDTYTLEVIQHAQQTRGVIVLTDPDFP 
40 GDKIRNTIREHVSGVKHAYVDREKAKSKRGKIGIEHANIKDIQEALMHVSSPLEEAKETI 
DKSVLI DLGLI IGKDARYRRNILGRKLH IGHSNGKQLLKKLNAFGYTEDDVRKALFEEEE 
N* 

Sequence 865 
45 Contig_0521_pos_14 61_571, 

is similar to (with p-value 2.0e-99) 

>sp:sp| P374 68|KSGA_BACSU DIMETHYLADENOSINE TRANSFERASE (EC 2 
.1.1.-) (S-ADENOSYLMETHIONINE-6-N' , N * -ADENOSYL ( RRNA) DIMETH 
YLTRANSFERASE) (16S RRNA DIMETHYLASE) (HIGH LEVEL KASUGAMYCI 

50 N RESISTANCE PROTEIN KSGA) ( KASUGAM YC I N DIMETH YLTRANSFERASE) 
. >gp:gp|D26185|BAC180K_105 B. subtilis DNA, 180 kilobase re 
gion of replication origin. NID: g467326. >gp : gp | Z99104 | BSUB 
0001_42 Bacillus subtilis complete genome (section 1 of 21): 
from 1 to 213080. NID: g2632267. 

55 atggaatataaagatatagcaacaccatctcgaacacgtgctttgcttgatcaatatggg 
tttaattttaagaaaagtttaggacaaaattttctaatagatgtaaatatcattaataaa 
attatcgaagcgagtcatatagattgtacaacgggtgtaattgaagttggaccaggtatg 
ggatcattgactgaacaacttgcaaagaatgctaagaaggtgatggcttttgaaattgat 
caaagattaatacctgtgcttaaagatacactt t caeca tacgataat gt a acaa t tat c 
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aatgaagatatacttaaagctgatattgctaaagctgtagatacacatctacaagattgt 
gacaagattatggttgttgctaatttaccgtattatattaccacacctattttacttaat 
ttgatgcaacaggatgtacctattgatggttttgtcgtaatgatgcaaaaagaggtagga 
gaacgtttgaacgctcaagtaggtaccaaagcatacggttcgttatcgattgttgctcaa 
5 tactatacggagacaagtaaagttttaacagttcctaaaactgtatttatgcctcctcca 
aacgttgattctatcgttgtaaaattgatgcaacgccaagaaccacttgtacaggttgat 
gatgaggaaggcttttttaagttagcaaaggccgcttttgcacaacgacgtaaaacaatt 
aataataactaccaaaacttctttaaagatggtaagaagaataaagaaactatacgacag 
tggctagaaagcgctggtattgatcctaaaagacgtggagaaacactcacgattcaagat 
10 ttcgccacattatatgaacaaaagaaaaaattctccgaattaacaaattaa 

Sequence 8 66 

MEYKDIATPSRTRALLDQYGFNFKKSLGQNFLIDVNI INKIIEASHI DCTTGVIEVGPGM 
GSLTEQLAKNAKKVMAFEIDQRLI PVLKDTLSPYDNVTIINEDILKADIAKAVDTHLQDC 
15 DKI MWANLP Y YI TT PI LLNLMQQDVP I DGFVVMMQKEVGERLNAQVGTKAYGSLS I VAQ 
YYTETSKVLTVPKTVFMPPPNVDSI WKLMQRQEPLVQVDDEEGFFKLAKAAFAQRRKTI 
NNNYQNFFKDGKKNKETIRQWLESAGIDPKRRGETLTIQDFATLYEQKKKFSELTN* 

Sequence 867 

Contig_0522_pos_1721_315, 
putative peptide of unknown function 

atggcaattctaacaccattatcaactacggattggcatgcatataaagttaatctaagt 
caatatttgactcaagaaaatggtcgttatttaggacatttatttgaatgggttgccgta 
cataataacataataagagctttaatatatgcgataacttcgtttttagttatctattta 
gttgcttatatggttcaattacatacgaatcgtatttattttattttgagttttgtgtta 
atggttactgtacctaatacaatttatagcgaaacttacgggtggtttactggatttttt 
agttatatacctgctacagtcctatcactttttattctttttacggtagttaaaaagatt 
gagtcgcacgatacagtttctgaaatgcaattatgggtatttttattagtaagtttgttt 
ggacaattcttcttggagaatctttccatcgctaatagcttaattattttaataggaatg 
gtagtctatttctttgttaaaaaaagactcagttatttcttaattgtaggatttatgctt 
agttgtataggtaacattataatgtttttaaacttcaattattttttaattaaggatgga 
ttaaatacgcattattcaatttccgatagtcatggaatgatacataaagcaggtgtgacg 
ttatttaagcttgtaccagaatatatgtttattaatcaaatgattattcttaccgtgata 
tcaatagtaagtatagttttacttaagcaaaataaaagcctgaagcatatgagagtttat 
attaaaataccactactcttaggtttaattactttacctatttataagatcttcgtttac 
aatcaatttcattttgaattatataaagcttcattttctatagccgttttgaatacaacg 
atttgcttcatttacatgataagtgtgatatacgttgtgtttaaaatgatacagcaaaga 
tacataagaatgattgtgatggggagttttatagctatggcttcatctgttttgccactt 
ttatttgtgacgcctataagttatagaaatttttattttatttatactttatggatcgtg 
atattactttgtttaattcagcaatgtgatgtgctatttaaacaacttgaacatataatt 
aaaatatttgcgattatcatcagcatcattatgatgattggatttacttttatacatatt 
agtagtgtgcacagaatagacttcattaaagaacaaataacacaacatcatcgctatcag 
aaaataacattggaaagattaccatttgagcgatatactcatatgactacaccaaagtcg 
aaggaacaacttcaagatttcaaacactattatgatttgcccaaagacatcacatttaaa 
gtagtcccatatggtacaaaacaataa 

Sequence 868 

MAILTPLSTTDWHAYKVNLSQYLTQENGRYLGHLFEWVAVHNNIIRALIYAITSFLVIYL 
VAYMVQLHTNRIYFILSFVLMVTVPNTIYSETYGWFTGFFSYIPATVLSLFILFTWKKI 
50 ESHDTVSEMQLWVFLLVSLFGQFFLENLSIANSLIILIGMVVYFFVKKRLSYFLIVGFML 
SCIGNIIMFLNFNYFLIKDGLNTHYSISDSHGMIHKAGVTLFKLVPEYMFINQMIILTVI 
SIVSIVLLKQNKSLKHMRVYIKIPLLLGLITLPI YKIFVYNQFHFELYKASFSIAVLNTT 
ICFIYMISVI YWFKMIQQRYIRMI VMGSFIAMASSVLPLLFVTPISYRNFYFI YTLWIV 
ILLCLIQQCDVLFKQLEHI IKIFAIIISI IMMIGFTFIHISSVHRIDFIKEQITQHHRYQ 
55 KITLERLPFERYTHMTTPKSKEQLQDFKHY YDLPKDITFKVVPYGTKQ* 

Sequence 869 

Cont ig__0 52 3_pos_52 8 0_0 , 

is similar to (with p-value 4.0e-55) 



25 



30 



35 



40 
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>pir:pirl B26532 I B26532 tyrA protein - Bacillus subtilis >gp: 
gp|M80245| BACVARGNS_17 B. subtilis dbpA, mtr(A,B), gerC(l-3), 
ndk, cheR, aro(B, E, F, H) , trp(A-F), hisH, and tyrA genes, co 
mplete cds . NID: gl43798. >gp : gp | Z99115 | BSUB0012_201 Bacillu 
5 s subtilis complete genome (section 12 of 21) : from 2195541 
to 2409220. NID: g2634478. 

atgactcatatacaatattgttattctgaaaattctgaaacaaggaggcgctctatgcga 
aatattttatttgtagggttaggccttattggcggtagcttggcgagtaatttaaaatat 
cattacagtaatttcaatattcttgcatacgattcggactacacacaacttgatgaagcc 

10 ctttctataggtattattgatcaaaaagttaatgattatgctactgctgttgagatagcg 
gatataatcatctttgcaactcctgttgagcaaacaattaaatatctatctgaacttaca 
aattacaatacaaaaactcatttgattgtaacagacacaggtagtaccaaacttactata 
caatcattcgaaaaagaattat taaaacatgatattcatttaattagtggtcatcctatg 
gcaggaagtcataaatctggtgttttaaacgcgaaaaaacatttatttgaaaatgcttat 

15 tacattcttgtatttaatgaaatcgaaaataatgaagccgcgacatatttaaagaaatta 
cttaaacctacgttagcaaaatt tat cgt tact catgcaaatgaacat gat ttcgtaacc 
ggtatagtgagtcatgttccacatatcatcgcttcaattttagttcatctaagtgctaat 
catgtcaaagaccattctttaatcgaaaaattagcagccggtggctttagagatataact 
cgtatagcaagtagtaatgctcagatgtggaaggatatcactttaaataatcaaaatcat 

20 attttatctttacttaacgagattaaagaacaaattactggtattgaaaatttgata 

Sequence 870 

MTHIQYCYSENSETRRRSMRNILFVGLGLIGGSLASNLKYHYSNFNILAYDSDYTQLDEA 
LSIGIIDQKVNDYATAVEIADI IIFATPVEQTIKYLSELTNYNTKTHLIVTDTGSTKLTI 
25 QSFEKELLKHDIHLISGHPMAGSHKSGVLNAKKHLFENAYYILVFNEIENNEAATYLKKL 
LKPTLAKFIVTHANEHDFVTGIVSHVPHI IASILVHLSANHVKDHSLIEKLAAGGFRDIT 
RIASSNAQMWKDITLNNQNHILSLLNEIKEQITGIENLI 

Sequence 871 
30 Contig_0523_pos_474 5_3273, 

is similar to (with p-value 2.0e-78) 

>sp:sp|Q02001|TRPE_LACLA ANT HRAN I LATE SYNTHASE COMPONENT I ( 
EC 4.1.3.27). >pir:pir I S35124 IS35124 anthranilate synthase ( 
EC 4.1.3.27) alpha chain - Lactococcus lactis subsp. lactis 

35 >gp:gp|M87 4 83| LACTRPOP_2 L. lactis trpE, trpG, trpD, trpF, t 
rpC, trpB trpA genes, complete cds. NID: gl49514. 
gtggtaccgcgcgtcagcgtccttataacgaaggaggctgatttttttcggaaaaggagg 
aaatcaatggatattgtatacaaaaaggtgaatgctcaaattacgccagaagctttagca 
aaattaaaacaaaaaaagatcatttttgaaagtacaaatcaacagaaacttaaaggtagg 

40 tactcgatagtagtattcgatcattatggcaaaattacattagataattctcaactttta 
attaagttagacaatcattgtgaaatagttaagaatcaaccgtatcaacgacttaaggaa 
tttgtagataaatattattttgaaatcaaagataaatatttaaaagatttaccttttatt 
tcgggctttatagggacatgtagctttgatttagtacgacatgaatttaaaaaattacaa 
gatattaaattagaagatcatcaaactcatgatgtccaattttatctagtggaagatgta 

45 tttgtttttgatcattataaagatgaattatatattatcgcaagtaacttattttcttat 
agaacaaaagagagattaaaggaatctattgaacgtaaaattgaagatttaaaaaacata 
catttttcggttgaggatataaattataaatccatccctcgacatataaccaccaatata 
tcagagcaacaatttgttcaaactattagaattttaaaaaagaaaattactgaaggagat 
atgtttcaagtagttccttcaagaatttatagttataaacaccattttcaacacaattta 

50 catcaattaacttttcagttatatcaaaatttaaagcgacaaaatcctagtccatatatg 
tattatattaataaagatgtaccgattgtaataggaagttctcctgaaagttttgtaaag 
gtaaaagatggaaaagtttatacgaatcctatagctggaacaattaaaagaggtcaaaat 
aaaaaagaagatgaaaataatgaaaagacattaatgaaagatgaaaaggaattgagtgaa 
catcgtatgctcgtagatttaggaagaaatgatattcatcgaataagtaaaacaggcact 

55 tcacaaattaccaaactaatgacaatagaacgttatgaacatgtcatgcatatcgttagt 
gaagttattggagaattaaaaccccatctatctcctatgagcgtcatcgcaagtttgcta 
ccaacgggtactgtctcaggtgcacctaaacttagagctatacagagaatatacgaatct 
tatccttataaaagaggtatctatagcggtggtgttgggtatatcaactgtaatcatcat 
ttagattttgcattggctatacgtaccatgattatcgatgaggaaaaagtcagtgtcgag 
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gcaggatgtggagtagtatatgattctattccagagaaagaacttgaagaaacaaaactt 
aaagctaaaagtttattggaggtaactccatga 

Sequence 872 

5 VVPRVSVLITKEADFFRKRRKSMDI VYKKVNAQITPEALAKLKQKKI I FESTNQQKLKGR 
YS I VVFDH YGKITLDNSQLLI KLDNHCEIVKNQPYQRLKEFVDKYYFEI KDKYLKDLPFI 
SGFIGTCSFDLVRHEFKKLQDIKLEDHQTHDVQFYLVEDVFVFDHYKDELYIIASNLFSY 
RTKERLKESIERKIEDLKNIHFSVEDINYKSIPRHITTNISEQQFVQTIRILKKKITEGD 
MFQVVPSRIYSYKHHFQHNLHQLT FQLYQNLKRQNPSPYMYYINKDVPIVIGSSPESFVK 
10 VKDGKVYTNPIAGTIKRGQNKKEDENNEKTLMKDEKELSEHRMLVDLGRNDIHRISKTGT 
SQITKLMTIERYEHVMHIVSEVIGELKPHLSPMSVIASLLPTGTVSGAPKLRAIQRIYES 
YP YKRGI YSGGVGYINCNHHLDFALAIRTMI IDEEKVSVEAGCGVVYDS I PEKELEETKL 
KAKSLLEVTP* 

15 Sequence 873 

Con t ig_0 52 3_pos_2 7 0 7_1 712, 

is similar to (with p-value 8.0e-51) 

>sp:sp|Pl7170|TRPD_LACCA ANTHRANILATE PHOSPHORIBOSYLTRANS FER 
ASE (EC 2.4.2.18). >pir : pir | S4 234 3 IJS034 0 anthranilate phosp 

20 horibosyltransf erase (EC 2.4.2.18) - Lactobacillus casei >gp 
:gp|D004 96ILBATRP_2 Lactobacillus casei DNA, trp operon (trp 
D, trpC, trpF, trpB, trpA) , complete cds . NID: g216754. 
atgacccttcttgagaaaattaaacaaaataaatctttatctaaaaaagatatgcaatca 
tttattgttacactgtttgattcaaatatagaaaccaatgtaaaggttgaattattgaaa 

25 gcttatacaaataaagacatgggtcaatatgagctaacgtatttagttgaatattttatc 
cagacaaactatccaaaccaaccattttataataaagctatgtgtgtttgtggcacaggt 
ggagatcaatcaaatagctttaatatttctacaactgtagcttttgttgtagcaagtgca 
ggagtgccagtcattaaacacggtaataaaagtattacttcacattcaggaagtacagat 
gtattacatgaaatgaatataaaaacaaacaaaatgaacgaagtagagcaacaattaaat 

30 ttgaaaggattagcattcataagtgcaactgattcttatccaatgatgaaaaagcttcaa 
tcaattagaaaatcgattgcaacacctacaatttttaacttgattggaccattaattaat 
cctttcaaattaacttatcaagtgatgggggtatatgaagcttcacaacttgaaaatata 
gcacaaacattaaaggatttaggtagaaaacgagcaattttaattcatggtgcaaatggg 
atggatgaggccacgctttctggtgaaaatatcatttatgaagttagcagcgaaagagca 

35 ttaaaaaaatatagtt taaaagcagaagaagtcggtttagcttatgcaaataatgacacg 
ttgataggtggttcacctcaaacaaataaacaaattgcattgaatatcctaagtggcacg 
gatcactcaagtaaacgagatgtagttttgttaaatgctggaattgctttatatgttgct 
gagcaagtggaaagtatcaaacatggcgtagagagagcgaaatatctcattgatacaggt 
atggcaatgaaacaatatttaaaaatgggaggttaa 

40 

Sequence 87 4 

MTLLEKIKQNKSLSKKDMQSFIVTLFDSNIETNVKVELLKAYTNKDMGQYELTYLVEYFI 
QTNYPNQPFYNKAMCVCGTGGDQSNSFNISTTVAFWASAGVPVIKHGNKSITSHSGSTD 
VLHEMNIKTNKMNEVEQQLNLKGLAFISATDSYPMMKKLQSIRKSIATPTIFNLIGPLIN 
45 PFKLTYQVMGVYEASQLENIAQTLKDLGRKRAILIHGANGMDEATLSGENII YEVSSERA 
LKKYSLKAEEVGLAYANNDTLIGGSPQTNKQIALNILSGTDHSSKRDVVLLNAGIALYVA 
EQVESIKHGVERAKYLIDTGMAMKQYLKMGG* 

Sequence 875 
50 Cont ig_0 5 2 3 j?os_l 70 8_920 , 

is similar to (with p-value 7.0e-43) 

>sp:sp|Q01999|TRPC_LACLA I NDOLE-3 -GLYCEROL PHOSPHATE SYNTHAS 
E (EC 4.1.1.48) (IGPS). >pir : pir j S35127 | S35127 indole-3-glyc 
erol-phosphate synthase (EC 4.1.1.48) - Lactococcus lactis s 
55 ubsp. lactis >gp:gp | M874 83 1 LACTRPOP_5 L. lactis trpE, trpG,. 
trpD, trpF, trpC, trpB trpA genes, complete cds. NID: gl4951 
4 . 

atgactattttaaatgaaattattgagtataaaaaaactttgcttgagcgtaaatactat 
gataaaaaacttgaaattttacaagataacggaaatgttaagaggagaaagctgattgat 
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tcacttaactatgatagaacattatcagttattgctgaaataaaatcgaaaagcccatct 
gtacctcaattaccgcaacgtgatcttgttcaacaagttaaagattatcaaaaatatggt 
gctaatgctatttcaatattaactgatgaaaaatactttggcggtagttttgaacgatta 
aatcagttatcaaagataacatcgttaccagttttatgtaaagattttattattgataaa 
5 attcaaatagatgttgcaaaacgagctggtgcatctattattttattaatagtaaatatt 
ttaagtgatgaccaattaaaagaattgtattcatatgcaacaaaccataatttagaagct 
ctagtagaagttcatacaattagagaacttgaacgtgcacaccaaattaaccctaaaatt 
attggtgttaataatcgtgatttaaaacgatttgaaaccgatgttctacatacaaataaa 
ttacttaagtttaaaaagtctaattgctgctacatttcagagagtggcattcatacaaaa 
10 gaagatgttgagaaaatagtagattcaagtattgacggtttacttgtaggggaggcatta 
atgaaaacaaatgacttaagtcagtttttgcctagtttaaagttaaagaagaatctctat 
gatagttaa 

Sequence 876 

15 MTILNEI IEYKKTLLERKYYDKKLEILQDNGNVKRRKLIDSLNYDRTLSVIAEIKSKSPS 
VPQLPQRDLVQQVKDYQKYGANAISILTDEKYFGGSFERLNQLSKITSLPVLCKDFIIDK 
IQIDVAKRAGASIILLIVNILSDDQLKELYSYATNHNLEALVEVHTIRELERAHQINPKI 
IGVNNRDLKRFETDVLHTNKLLKFKKSNCCYISESGIHTKEDVEKIVDSSIDGLLVGEAL 
MKTNDLSQFLPSLKLKKNLYDS* 

20 

Sequence 877 

Cont ig_052 3_posJ7 8 6_30 7 , 

putative peptide of unknown function 

gtgccagatcatatagagaaagtagtggtcgtagtaaatcctcaaatgtccaccataaag 
25 agaataattaatcaaactgatattaacacaatccaattacatggaaatgaaagcattcaa 
ttaattagaaatattaagaaacttaattcaaaaataagaatcataaaagcaattccagca 
acaagaaatttaaataataacattcaaaagtataaagatgagatagacatgtttattata 
gatacaccatcaatcacatacggagggacaggtcaaagttttgactggaaattattaaaa 
aaaataaagggcgttgattttctcattgcgggtggtttggattttgaaaagataaaacga 
30 ttagaaatatattcatttggacaatgtggttatgacatctcaactggcattgagtcacat 
aatgaaaaagattttaataagatgactcgaatattaaaatttttgaaaggagacgaatga 



Sequence 878 

35 VPDHIEKWVVVNPQMSTIKRIINQTDINTIQLHGNESIQLIRNIKKLNSKIRIIKAIPA 
TRNLNNNIQKYKDEIDMFIIDTPSITYGGTGQSFDWKLLKKIKGVDFLIAGGLDFEKIKR 
LEI YSFGQCGYDISTGIESHNEKDFNKMTRILKFLKGDE* 

Sequence 87 9 
40 Contig_0524_pos_4 71_1280, 

is similar to (with p-value 7.0e-48) 

>gp:gp|L19300 |STAORFPHI_2 Staphylococcus aureus DNA sequence 
encoding three ORFs, complete cds; prophage phi-11 sequence 
homology, 5* flank. NID: g310601. 

45 atgaaaagaagcgataaatatacggatgattatattgaacaacgttatgagtctcaacga 
ccttattacaatacatattatcaaccaatagggaaaccaccgaaaaagaaaaaaagtaaa 
agaattttcttaaaagcaattatcactatattaattttattgattatattttttggtgtc 
atgtactttatttcttcaagagcaaatgtagatgatttaaaatcaattgaaaataaaagc 
gattttgttgctaccgaaaatatgcctaactatgtaaaaggcgcatttatttcaatggag 

50 gatgagcgtttctataaacatcatggctttgatataaaaggaacgacaagggcattgttt 
tcaactattagcgatagagatgtgcaaggtggaagtacaattacgcaacaagttgtaaag 
aattattattacgataatgaacgatcctttacaagaaaaatcaaagaattgtttgtagcg 
cgtaaagttgaaaagcaatacagtaaaaatcagattttaagtttctatatgaataatatt 
tattatggtgataatcaatatactgtagaaggtgctgcaaatcattattttggtgtaacg 

55 gtcgataaaaacaattcaaatatgagtcagattagtgtgttacaaagtgctatattagca 
agcaaagtaaatgcaccaagtgtgtatgatgtaaatgatatgtcgaataattacatcaat 
agagttaaaaccaatttagagaaaatgaaacaacaaaattttattagtgaatcacaatat 
caagaagctatgtctcaacttggaaattaa 
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Sequence 880 

MKRSDKYTDDYIEQRYESQRPY YNTYYQPIGKPPKKKKSKRI FLKAI ITILILLIIFFGV 
MYFISSRANVDDLKSIENKSDFVATENMPNYVKGAFISMEDERFYKHHGFDIKGTTRALF 
STISDRDVQGGSTITQQVVKNYYYDNERSFTRKIKELFVARKVEKQYSKNQILSFYMNNI 
5 YYGDNQYTVEGAANHYFGVTVDKNNSNMSQISVLQSAILASKVNAPSVYDVNDMSNNYIN 
RVKTNLEKMKQQN FI S ESQYQEAMSQLGN* 

Sequence 881 

Contig_0524jpos_1558_2361, 

10 putative peptide of unknown function 

atgccaaaggtaactaaaatagaagtacaaaaaaagaataaagaacgctttaatctcttt 
ttagatggagaatttgaaatggggatagatattgatacattagttaaatttaacttaaaa 
aaagatcaaatacttgaaccgtcagatatgcagaatattcaagaatatgatcactaccgt 
cgaggtgttaatcttgcaattcaatacttgtcttataagaaacgtactgaaagagaagtt 

15 atacagtatttagaaaaaaacgatattcaaagtaatgctattcaagatgtcattgactat 
tgctataaggaaaaatttattgatcatgaagactacgcagaaagtttaaaaaacaccatg 
atacacactacagataaaggaccagaaatatatagacaaaaactctatcaattaggtatt 
gaagttacgattattgaaaaatatgtcgaagcatatgaacaacaacaaccattagatgac 
gtcatcaaagttgctgaaaaagtgatgaagtctaaaaagggtcctgaagcaaaggtaaag 

20 caaaaagtaacacagtcacttctccaaaaaggatataagtttgaaacaattcaactagtt 
atgaatgaaatagatttttctcaagacgaagaaacattagaccatttattgcaacgtgat 
ttagagaaagtctataataaaaattgtagaaaatatgacagtgataaaagtgttattaaa 
accatagaggcactcatgagaaaaggctataattatgataaaattaaatctaaattagaa 
gaaagcggtatatctaatgaataa 

25 

Sequence 882 

MPKVTKIEVQKKNKERFNLFLDGEFEMGIDIDTLVKFNLKKDQILEPSDMQNIQEYDHYR 
RGVNLAIQYLSYKKRTEREVIQYLEKNDIQSNAIQDVIDYCYKEKFIDHEDYAESLKNTM 
IHTTDKGPEIYRQKLYQLGIEVTIIEKYVEAYEQQQPLDDVIKVAEKVMKSKKGPEAKVK 
30 QKVTQSLLQKGYKFETIQLVMNEIDFSQDEETLDHLLQRDLEKVYNKNCRKYDSDKSVIK 
TIEALMRKGYNYDKIKSKLEESGISNE* 

Sequence 883 

Contig_0524_pos_2369_2668 f 

35 putative peptide of unknown function 

atgagtgaaatgtccgagcaagaactaagacatgaaatacaattatttaaagaaaaaatg 
cgtaaagcagagatgaatggcattatgaatgaatatgatgtttatcaaagcaaagtgatt 
atagcagaaagctatcttgtggatcgcaataaaattgaacctggaaaaatctataaactc 
aatgatggtagtaaacagtactttaaagtagaacgactcaagggtgtatttgcatgggga 

40 tttagaataaacagtagtgaacctgaggaaggtctaccattagcattattaaaattttag 



Sequence 884 

MSEMSEQELRHEIQLFKEKMRKAEMNGIMNEYDVYQSKVIIAESYLVDRNKIEPGKIYKL 
45 NDGSKQYFKVERLKGVFAWGFRINSSEPEEGLPLALLKF* 

Sequence 885 

Contig_0524__pos_2978_4201 / 

putative peptide of unknown function 

50 gtgagccaatatgtgagcgaactcgtacaactttttccttatgaagtaactgagcataaa 
gttgaacaaattattcaatgggcacatttaggtgattacatagaagaaaaggtttctaat 
ttaagcgaaaaatcatacgcacaacttttacttagtattgcacgttcttcaaaaaacgat 
attatcattttaaatcatgttttatcacatttagatgaaacatttatagaaagagctaca 
gttttatcaaaagattatattgaagctaataaaacactagtcttaattgataatgatgta 

55 gaaaagataagcaaaacaagtaactacataacatgggtatcacacggccagatacgcaag 
gaaggttctctaaatcaggtattaccaacttttagagaacatgaaaaagatcgtactagt 
ttaaaatcagaaatggaaattgagaactttgattatgattggaagcagaatcgctctcgt 
attcctgaaatgacttataattttaaaagaatagaacgttataatcatgctaagccacca 
agatttctagtgagattttggaccttatttgtcagctttttgattgggcttgttttaatg 
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agcgtacttttctttaacaacttaggtatggtcaaactagggaacattaatacccaagca 
tccatacagaatcagaataaagatacttacgaagaaaaacttgcatatggattagcatta 
gat ggatcagt tact ttaaacgggtctaaagatttaaaagtacctaagtatagtttaatt 
acaattactggagaaaataacaaaagatatcgggtcgaaatgaatcaaaggagatatagt 
5 gttagtaaaaatcaagtgttttatttcaatccagctgggttatacgaatctcatactttt 
aagaaattgtcaccttatataaaatcaaattatagtacttacgtagagtactttaacagt 
cacttacatcaaaaacatgataaagtaacagaaacgcttagacctgataaagataaaaag 
tatgttgtaccgatcacgcaacaacctataaaaatgatatttggtgataatgataaactg 
tctggatttgttattccaatgacaaataaaacggaattgaaaaaaacatttaatatcacg 
10 aaagatgtatggattacaaaaagtggaagcggttattttatcgctgatatgaaagaagaa 
aaatggatttatattgaattgtag 

Sequence 886 

VSQYVSELVQLFPYEVTEHKVEQIIQWAHLGDYIEEKVSNLSEKSYAQLLLSIARSSKND 
15 IIILNHVLSHLDETFIERATVLSKDYIEANKTLVLIDNDVEKISKTSNYITWVSHGQIRK 
EGSLNQVLPTFREHEKDRTSLKSEMEIENFDYDWKQNRSRIPEMTYNFKRIERYNHAKPP 
RFLVRFWTLFVSFLIGLVLMSVLFFNNLGMVKLGNINTQASIQNQNKDTYEEKLAYGLAL 
DGSVTLNGSKDLKVPKYSLITITGENNKRYRVEMNQRRYSVSKNQVFYFN PAGLYESHTF 
KKLSPYIKSNYSTYVEYFNSHLHQKHDKVTETLRPDKDKKYVVPITQQPIKMIFGDNDKL 
20 SGFVIPMTNKTELKKTFNITKDVWITKSGSGYFIADMKEEKWI YIEL* 

Sequence 8 87 

Con t i g_0 5 2 5_po s_3 9 6_2 015, 
is similar to (with p-value 0.0e+00) 
25 >sp : Sp | P4 5554 | DNAK_STAAU DNAK PROTEIN (HEAT SHOCK PROTEIN 70 
) (HSP70). >gp: gp| D30690 I STANHS_3 Staphylococcus aureus gene 
s for ORF37; HSP20; HSP70; HSP40; ORF35, complete cds . NID: 
g487326. 

atgggtacagattataaagtagatattgaaggtaaatcatatacaccacaagaactttca 

30 gcaatgattttacaaaatttaaaaagcactgcagaaaactatttaggggatacagtagac 
aaagctgttatcactgtccctgcttatttcaatgatggtgaacgtcaagcaactaaagat 
gctggtaaaattgcaggcttagaagttgaacgtattatcaacgaacctacagctgctgca 
cttgcttatggtttagataaaactgaaacagatcaaaaggttctcgtatttgacttaggt 
gggggaacatttgacgtatctattctagagttaggcgacggcgtatttgaagtattatca 

35 actgccggagataataaacttggtggcgatgacttcgaccaagtgattattgattatctt 
gtttcagaattcaagaaagagaatggtgtagatttatcacaagataaaatggcattacaa 
agattaaaagatgctgccgaaaaagctaaaaaagatttatcaggtgtttctcaaactcaa 
atttcattaccattcatttctgctggagaaaatggcccattacacttagaaattagttta 
actcgttctaaatttgaggaattagctgattcattaatcaaaaaaactatggaaccgact 

40 cgtcaagcattaaaagatgctggtttatctacttcagaaatagatgaagttattttagtt 
ggtggttcaacacgtattccggccgttcaagaagctgttaaaaaagaaattgggaaagaa 
ccacataaaggtgttaacccagatgaagttgtagcaatgggtgctgctattcaagctggt 
gtaatcacaggtgatgttaaagatgtagtattacttgatgttacgccattatctttaggt 
atcgaaattatgggtggacgtatgaacacattaattgaacgtaatactactattccaact 

45 tccaaatcacaagtttattctacagcagctgacaatcaaccagcagtagatattcatgta 
ttacaaggtgaacgtccaatggcatctgacaacaaaactttaggaagattccaattaact 
gacattccacctgcaccacgtggtgtacctcaaatcgaagtaacatttgatatcgataaa 
aacggtattgttaacgttacagctaaagatttaggtactaataaagaacaaaacattaca 
atacaatcaagctcatctctatctgatgaaga'aatcgatcgcatggtgaaagatgctgaa 

50 gaaaatgctgaagcagataaaaaacgtcgtgaagaagtagacttgcgaaacgaagcagat 
agtctagtattccaagttgaaaaaacagttaaagacttaggcgaaaatattagcgatgaa 
gataagaaaaatgctgaagagaaaaaagatgcacttaaaacagcattagaaggtgaagac 
atcgacgatattaaagctaaaaaagaagaacttgaaaaagtaattcaggaattatctgca 
aaagtttatgaacaagctcaacaagcacaacaacaaggccaagaagaacaaggttctcaa 

55 gatagcactgttgaagatgcagactttaaagaagttaaagatgacgaagataaaaaataa 

Sequence 888 

MGTDYKVDIEGKSYTPQELSAMILQNLKSTAENYLGDTVDKAVITVPAYFNDGERQATKD 
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AGKIAGLEVERIINEPTAAALAYGLDKTETDQKVLVFDLGGGTFDVSILELGDGVFEVLS 
TAG DN KLGG D DFDQV 1 1 D YLVS E FKKENG VDLS QDKMALQRLK DAAEKAKK DLS G VS QTQ 
ISLPFISAGENGPLHLEISLTRSKFEELADSLIKKTMEPTRQALKDAGLSTSEIDEVILV 
GGSTRIPAVQEAVKKEIGKEPHKGVNPDEVVAMGAAIQAGVITGDVKDVVLLDVTPLSLG 
5 IEIMGGRMNTLIERNTTIPTSKSQVYSTAADNQPAVDIHVLQGERPMASDNKTLGRFQLT 
DIPPAPRGVPQIEVTFDIDKNGIVNVTAKDLGTNKEQNITIQSSSSLSDEEIDRMVKDAE 
ENAEADKKRREEVDLRNEADSLVFQVEKTVKDLGENISDEDKKNAEEKKDALKTALEGED 
IDDIKAKKEELEECVIQELSAKVYEQAQQAQQQGQEEQGSQDST.VEDADFKEVKDDEDKK* 

10 

Sequence 889 

Cont i g_05 2 5_po s_2 1 6 0_3 281, 

is similar to (with p-value 0.0e+00) 

>sp : sp I P4 5555 I DNAJ_STAAU DNAJ PROTEIN (HSP40). >gp : gp | D30690 
15 |STANHS_4 Staphylococcus aureus genes for ORF37; HSP20; HSP7 
0; HSP40; ORF35, complete cds . NID: g487326. 

atggccaaaagagactattatgaagtcttaggcgtaaacaaaagcgcttctaaagacgaa 
attaaaaaagcttatcgtaaattatcaaaaaaataccatccagatataaataaagaagaa 
ggcgcagacgaaaaattcaaagaaatctccgaagcatatgaagttttaagtgatgaaaac 

20 aaacgtgcaaattatgatcaatttggtcatgacggaccacaaggcggatttggaagtcaa 
ggctttggtggcagtgactttggtggatttgaagatattttcagctcattctttggtggc 
ggttcacgtcaaagagatcctaatgcacctcgcaaaggtgatgaccttcaatacacaatg 
acaataacatttgaagaggctgtattcgggacaaaaaaagaaatatcaataaaaaaagat 
gtaacatgtcatacatgtaacggtgatggggctaaacctggtacaagtaaaaaaaattgt 

25 agctattgtaatggcgctggtcgtgtttctgttgaacaaaatactattttgggtagagtg 
agaactgaacaagtttgtcctaaatgtgaaggtagtggacaagaatttgaagaaccatgt 
ccaacatgtaaaggaaaaggtactgaaaataaaacagt taaactagaagtaactgttcct 
gaaggtgtagataacgaacaacaagttcgtttagctggagaaggttcacctggtgttaac 
ggaggaccacatggtgacctatatgtggtgttcagagttaaaccatccaatacatttgaa 

30 cgtgatggagacgatatctactataatctagatattagcttttcacaggctgcactaggt 
gatgaaattaagatacctacattaaaaagtaatgttgttttaaccattccggcaggtaca 
caaacgggtaaacaattccgacttaaagataaaggtgtaaagaatgttcatggttatggc 
tacggggacttatttgtcaacataaaagtggttacaccaacaaaattaaatgaccgtcaa 
aaagaattattaaaagaatttgctgaaattaatggtgaaaatataaatgaacagtcatct 

35 aatttcaaagatagagcgaaaagattctttaaaggagaatag 

Sequence 890 

MAKRDYYEVLGVNKSASKDEIKKAYRKLSKKYHPDINKEEGADEKFKEISEAYEVLSDEN 
KRANYDQFGHDGPQGGFGSQGFGGSDFGGFEDI FSSFFGGGSRQRDPNAPRKGDDLQYTM 
40 TITFEEAVFGTKKEISIKKDVTCHTCNGDGAKPGTSKKNCSYCNGAGRVSVEQNTILGRV 
RTEQVCPKCEGSGQE FEEPCPTCKGKGTENKTVKLEVTVPEGVDNEQQVRLAGEGS PGVN 
GGPHGDLYVVFRVKPSNTFERDGDDIYYNLDISFSQAALGDEIKIPTLKSNVVLTIPAGT 
QTGKQFRLKDKGVKNVHGYGYGDLFVNIKVVTPTKLNDRQKELLKEFAEINGENINEQSS 
NFKDRAKRFFKGE* 

45 

Sequence 891 

Contig_0525_pos_3285_3893, 

is similar to (with p-value 7.0e-83) 

>sp:sp|P4 5557|PRMA_STAAU PROBABLE METHYLTRANSFERASE (EC 2.1. 
50 1.-). >gp:gpl D30690 I STANHS_5 Staphylococcus aureus genes for 
ORF37; HSP20; HSP70; HSP40; ORF35, complete cds. NID: g4873 
26. 

atgaattggatggaactctcaattgtagttaatcacgaagtagaatacgatgttacagaa 
attcttgaaagttatggctctaatggagttgtaattgaagattcaaatattttagaagaa 
55 caacct at t ga t aag t t tggagaa a tttatgactt a a accctgaagactatcctga a aa a 
ggagttcgattaaaagcttactttaatgagttcacttataatgaaaacttaaaatccaac 
atcaattatgaaatattaagtcttcagcaaattgataaaacaat ttatgattaccaggaa 
aaacttattgccgaagtagattgggaaaatgaatggaagaattattttcatccatttaga 
gcttcaaaacaatttacgatagtaccaagttgggaatcatatgttaaagaaaatgataac 
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gaattgtgcattgaattagatccaggtatggcttttggaacaggtgatcatccaacgaca 
agtatgtgtttaaaagcaattgaaacttttgtaaaaccaactgattcagttatcgacgtt 
ggaacagggtcaggcattttaagtattgctagtcatttacttggagt.tcaaagaataagg 
gggatttga 

5 

Sequence 892 

MNWMELSIVVNHEVEYDVTEILESYGSNGVVIEDSNILEEQPIDKFGEIYDLNPEDYPEK 
GVRLKAYFNEFTYNENLKSNINYEILSLQQIDKTIYDYQEKLIAEVDWENEWKNYFHPFR 
ASKQFTIVPSWESYVKENDNELCIELDPGMAFGTGDH PTTSMCLKAIETFVKPTDSVIDV 
1 0 GTGSG I LSI ASHLLGVQRI RGI * 

Sequence 893 

Contig_0525_pos_1838_1506, 
is similar to (with p-value 5.0e-42) 
15 >sp:sp|P45554 (DNAK_STAAU DNAK PROTEIN (HEAT SHOCK PROTEIN 70 
) (HSP70) . >gp:gp|D30690|STANHS_3 Staphylococcus aureus gene 
s for ORF37; HSP20; HSP70; HSP40; ORF35, complete cds . NID: 
g487326. 

atgtcttcaccttctaatgctgttttaagtgcatcttttttctcttcagcatttttctta 
20 tcttcatcgctaatattttcgcctaagtctttaactgttttttcaacttggaatactaga 
ctatctgcttcgtttcgcaagtctacttcttcacgacgttttttatctgcttcagcattt 
tcttcagcatctttcaccatgcgatcgatttcttcatcagatagagatgagcttgattgt 
attgtaatgttttgttctttattagtacctaaatctttagctgtaacgttaacaataccg 
tttttatcgatatcaaatgttacttcgatttga 

25 

Sequence 8 94 

MSSPSNAVLSASFFSSAFFLSSSLIFSPKSLTVFSTWNTRLSASFRKSTSSRRFLSASAF 
SSASFTMRSISSSDRDELDCIVMFCSLLVPKSLAVTLTIPFLSISNVTSI* 

30 Sequence 8 95 

Cont i g_0 5 2 6_pos_5 5 5_1 4 9 9 , 

is similar to (with p-value 0.0e+00) 

>gp:gpl U15783 |SEU15783_1 Staphylococcus epidermidis orf334 p 
rotein, putative multidrug resistance protein QacC, and QacC 

35 1 genes, complete cds. NID: g622953. 

atgacgaaaagtgggaaacaacgcccatggagagaaaagaagatagataatgtaagttat 
gcagatatactggaaattttaaaaataaaaaaggcttttaatgtaaaacaatgtggtaac 
gtcttagagttcaagccgactgatgaaggttatttgaagttacataagacatggttttgt 
aagtcgaaactctgcccagtttgtaattggaggcgtgctatgaaaaatagttatcaagct 

40 caaaaagtgattgaagaagttgttaaagaaaaaccaaaagcgcgttggttatttttaaca 
ctttcaacgaaaaatgcgatagatggggatactttagaacaaagtttgaaacatttaacg 
aaagcatttgataggttaagtagatataaaaaagtgaagcaaaatcttgttgggtttttg 
cgttcaacggaagtaacagttaataaaaatgatggtagttataatcaacatatgcatgtt 
ttattatgtgttgaaaatagttattttaagaataaagctaattatataactcaagaagaa 

45 tgggttaatttatggcaaaaagcattacaagtaaattatcgacccgtagcaaatattaaa 
gcgatcaaaccaaatcaaaaaggcgataaagatattcaagcagctatcaaagaaacctct 
aaatattcggttaagtcatctgattttttaactgatgatgatgaaagaaatcaagaaatc 
gtgaatgatttagaaaaaggtttatatcgaaaacgtatgttgagttatggtggtttgctt 
aaacaaaaacataagattttaaatttagatgatgccgaagatggcaatttgattaataca 

50 agtgacgaagataaaacaacagacgaagaagaaaaagcacattcaattacggcaatttgg 
aattttgaaaaacaaaattattatttaaaagatttgaaacgttag 

Sequence 896 

MTKSGKQRPWREKKIDNVSYADILEILKIKKAFNVKQCGNVLEFKPTDEGYLKLHKTWFC 
55 KSKLCPVCNWRRAMKNSYQAQKVIEEVVKEKPKARWLFLTLSTKNAIDGDTLEQSLKHLT 
KAFDRLSRYKKVKQNLVGFLRSTEVTVNKNDGSYNQHMHVLLCVENSYFKNKANYITQEE 
WVNLWQKALQVNYRPVANIKAIKPNQKGDKDIQAAIKETSKYSVKSSDFLTDDDERNQEI 
VNDLEKGLYRKRMLSYGGLLKQKHKILNLDDAEDGNLINTSDEDKTTDEEEKAHSITAIW 
NFEKQNYYLKDLKR* 
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Sequence 8 97 

Contig_052 6_pos_2971_3645, 
is similar to (with p-value 0.0e+00) 
5 >sp:sp|Pl4 506|TRAl_STAAU TRANSPOSASE FOR INSERTION SEQUENCE 
ELEMENT IS257 IN TRANSPOSON TN4003. >pir :pir | SO 4 1 62 I SO 4 1 62 t 
ransposase 1 - Staphylococcus aureus plasmid pSKl transposon 
Tn4003 >gp:gp 1X13290 | SATN4003_1 Staphylococcus aureus multi 
-resistance plasmid pSKl DNA containing transposon Tn40O3. N 
10 ID: g46747. >gp : gp | X13290 I SATN4003_6 Staphylococcus aureus m 
ulti-resistance plasmid pSKl DNA containing transposon Tn400 
3. NID: g46747. >gp: gp | U40259 | SEU40259_11 Staphyloccous epid 
ermidis trimethoprim resistance plasmid pSK639. NID: gl76207 
9. >gp: gp I U40381 I SEU40381_1 Staphyloccous epidermidis plasmi 
15 d pSK697 insertion sequence IS257(697A) putative transposase 
gene, complete cds. NID: g!762091. >gp: gp I U40384 I SEU40384_1 
Staphyloccous epidermidis plasmid pSK818 insertion sequence 
IS257(818A) putative transposase gene, complete cds. NID: g 
1762097. >gp:gp|AF051916|AF051916_l Staphylococcus aureus pi 
20 asmid pJEl remnant of replication protein Rep (rep) , trimeth 
oprim resistance protein DfrA (dfrA) , thymidylate synthetase 
ThyE (thyE) , and putative transposase Tnp (tnp) genes, comp 
lete cds; and unknown gene. NID: g3676404. >gp:gp| AF051916 |A 
F051916_7 Staphylococcus aureus plasmid pJEl remnant of repl 
25 ication protein Rep (rep), trimethoprim resistance protein D 
frA (dfrA), thymidylate synthetase ThyE (thyE), and putative 
transposase Tnp (tnp) genes, complete cds; and unknown gene 
. NID: g3676404. >gp:gp | AF051917 | AF051917_21 Staphylococcus 
aureus plasmid pSK41, complete sequence. NID: g3676412. 
30 atgaactatttcagatataaacaatttaacaaggatgttatcactgtagccgttggctac 
tatctaagatatgcattgagttatcgtgatatatctgaaatattaaggggacgtggtgta 
aacgttcatcattcaacggtctaccgttgggttcaagaatatgccccaattttatatcaa 
atttggaagaaaaagcataaaaaagcttattacaaatggcgtattgatgagacgtacatc 
aaaataaaaggaaaatggagctatttatatcgtgccattgatgcagagggacatacatta 
35 gatatttggttgcgtaagcaacgagataatcattcagcatatgcgtttattaaacgtctc 
attaaacaatttggtaaacctcaaaaggtaattacagatcaggcaccttcaacgaaggta 
gcaatggctaaagtaattaaagcttttaaacttaaacctgactgccat tgtacatcgaaa 
tatctgaataacctcattgagcaagatcaccgtcatattaaagtaagaaagacaaggtat 
caaagtatcaatacagcaaagaatactttaaaaggtattgaatgtatttacgctctatat 
40 aaaaagaaccgcaggtctcttcagatctacggattttcgccatgccacgaaattagcatc 
atgctagcaagttaa 

Sequence 898 

MNYFRYKQFNKDVITVAVGYYLRYALSYRDISEILRGRGVNVHHSTVYRWVQEYAPILYQ 
45 IWKKKHKKAYYKWRIDETYIKIKGKWSYLYRAIDAEGHTLDIWLRKQRDNHSAYAFIKRL 
IKQFGKPQKVITDQAPSTKVAMAKVIKAFKLKPDCHCTSKYLNNLIEQDHRHIKVRKTRY 
QSINTAKNTLKGIECIYALYKKNRRSLQI YGFSPCHEISIMLAS* 

Sequence 899 
50 Contig_0526_pos_37 4 4_4 4 84 , 

is similar to (with p-value 3.0e-23) 

>sp:sp|P30267|YKAA_BACFI HYPOTHETICAL 50.9 KD PROTEIN IN KAT 
A 3'REGION (ORF A). >pir : pir I S274 91 j S274 91 hypothetical prot 
ein A - Bacillus firmus >gp:gp I L02548 I BACKATA2_1 B.firmus OR 
55 FA and ORF B, complete cds. NID: gl43118. 

gtgccggtaacgattaataataactcttcaattttattagatcattttgtcacatggata 
agtagcgcattacctcttttaactaagatattcataatgattatcattatactaggtgct 
atttatccatttattaaagggacatggaatcggaataccgttgaaacaatttttagttta 
tttaaagttttgggagtcattataggcgttttgttaatttttaacattgggccaagttgg 
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ttacttaatgaacaaacgggaatgtatgtttttaactatttggtaattccggtaggatta 
acagtacctgcaggaggcgcggtattagctttattagtaggatatggcttattagaattt 
gtaggtgtttatgcgcaaaaaattatgtacccgatatggaaaacgcctggacgttcagca 
gttaatgctttagcatcttttgttgctagttttgctgtgggtttacttataacgaataaa 
5 gagtataaagaaggtaaattcacggaaaaacaagctgttatcatagcaaccggcttttct 
acagttactgtagcttttatgatagttattgctaaaaccttacacttaatggatatatgg 
aatttatatttttggtctaccttgtttgttactgctgcagtaacagcttgtacagttagg 
atttggcctatcagtaaaattagcaacacatattatgatcagccatttatagaagaagat 
- acaagcgaattaaaaggttaa 

10 

Sequence 900 

VPVTINNNSSILLDHFVTWISSALPLLTKIFIMIIIILGAIYPFIKGTWNRNTVETIFSL 
FKVLGVIIGVLLI FNIGPSWLLNEQTGMYVFNYLVIPVGLTVPAGGAVLALLVGYGLLEF 
VGVYAQKIMYPIWKTPGRSAVNALASFVASFAVGLLITNKEYKEGKFTEKQAVIIATGFS 
15 TVTVAFMIVIAKTLHLMDIWNLYFWSTLFVTAAVTACTVRIWPISKISNTYYDQPFIEED 
TSELKG* 

Sequence 901 

Contig_0528jpos_68 6_14 50, 

20 is similar to (with p-value 5.0e-56) 

>sp:sp| P39605I YWCG_BACSU HYPOTHETICAL 28.3 KD PROTEIN IN QOX 
D-VPR INTERGENIC REGION. >pir : pir I S39698 I S39698 hypothetical 
protein - Bacillus subtilis >gp: gp 1X73124 I BSGENR_44 B.subti 
lis genomic region (325 to 333). NID: g413923. >gp: gp | Z99123 

25 | BSUB0020 106 Bacillus subtilis complete genome (section 20 
of 21): from 3798401 to 4010550. NID: g2636240. 
gtgggtatagtgtcagattatgtttatgagttgatgaaacaacatcattcagttagaaaa 
tttaagaatcaaccacttggttctgaaacggtagaaaaattagtagaggcgggacagagt 
gcttctacatccagttatcttcaaacttattctattattggtgttgaagatccaagcatt 

30 aaagcgcgtttaaaggaagtgtcaggtcagccttatgttttagataatggttatttattt 
gtatttgttttagattattatcgtcatcatttagtagatgaagttgcggcgtcaaatatg 
gagacatcatatggttctgcagaaggactattagtaggtacaatagatgttgcattagtt 
gcgcaaaacatggcagttgctgccgaagatatggggtatggaattgtttatttagggtca 
ttgcgtaatgatgttgcgcgagtgcgtgaaattttaaatttacctgattatacgtttccg 

35 ttatttggtatggcagtaggtgaaccttctgatgaagaaaatgggtcacctaaaccgcgc 
ttgccatttaaacatatttttcataaagaccagtatgatgcgaatcagcatcaacaacgt 
aaagaattggaagcatacgaccaagtagtgagtgaatattataaagaacgtactcacggt 
gtgcgtacagaaaattggtcacaacaaatagaaacatttctaggacgtaaaacacgttta 
gatatgttagatgaattgaaaaaagcaggatttattcaaagataa 

40 

Sequence 902 

VGIVSDYVYELMKQHHSVRKFKNQPLGSETVEKLVEAGQSASTSSYLQTYSIIGVEDPSI 
KARLKEVSGQPYVLDNGYLFVFVLDYYRHHLVDEVAASNMETSYGSAEGLLVGTIDVALV 
AQNMAVAAEDMGYGIVYLGSLRNDVARVREILNLPDYTFPLFGMAVGEPSDEENGSPKPR 
45 LPFKHIFHKDQYDANQHQQRKELEAYDQVVSEYYKERTHGVRTENWSQQIETFLGRKTRL 
DMLDELKKAGFIQR* 

Sequence 903 

Contig_0528_pos_2809_3198, 

50 putative peptide of unknown function 

atgaggtgtaatacaatgccaaatacgatacctattgctaaagcggtaaagacacgttta 
gggaatgagacgtgttttcttgccataatatttaacattattaggaaaacaagtaacacg 
atgacattaattactgtaaaaagtatttccatatcgatatatagctcctttttaattttt 
agtattccgacatatgtaataagaataaagggggttaagaagaaattcaagttgttgatt 

55 tatagttttcagattcatttaaagttgaatagagtgaaggtgattactagaacaatgaaa 
aatgagaagtttatgtttatcagaaaagtgagcaaagaggacgtaaagaggtttatatta 
ataaaaaagacgcgtgcaatcatgctataa 

Sequence 904 
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MRCNTMPNTI PI AKAVKTRLGNETCFLAI I FNI I RKTSNTMTLI TVKS ISISIYSSFLIF 
SIPTYVIRIKGVKKKFECLLI YSFQIHLKLNRVKVITRTMKNEKFMFIRKVSKEDVKRFIL 
IKKTRAIML* 

5 Sequence 905 

Contig_0528_pos_2962_1574, 

is similar to (with p-value 0.0e+00) 

>sp:splP54 596| YHCL_BACSU HYPOTHETICAL 4 9.0 KD PROTEIN IN CSP 
B-GLPP INTERGENIC REGION . >gp : gp I X96983 I BS75DGREG_13 B.subti 

10 lis chromosomal DNA (region 75 degrees: cspB upstream of glp 
PFKD operon) . NID: gl239975. >gp : gp | Z99108 | BSUB0005_18 1 Baci 
llus subtilis complete genome (section 5 of 21) : from 802821 

to 1011250. NID: g2633055. 
atggaaatactttttacagtaattaatgtcatcgtgttacttgttttcctaataatgtta 

15 aatattatggcaagaaaacacgtctcattccctaaacgtgtctttaccgctttagcaata 
ggtatcgtatttggcattgtattacacctcatatatggtgcagagtctaaaactctcgaa 
caatcaacagactggtttagtattgttggagatggttatgttgcactattacaaatgatt 
gtcatgccactaatattcatttcaattgttgccgcttttagcaaaatacaaattggtgaa 
aaattcgctaagatcggttcttatatttttatgtttttaattggtactgtagccattgca 

20 gctatcgttggaattttttacgctttgatctttggtttagatgcatcgtctattgattta 
ggtagtgcagaacattcacgtggtacagaaatttcaaaacaagccaaagatttaactgca 
aacactttaccacaacaaattctcgaagtattcccaagcaatccatttttagatttcaca 
ggacaacgtacaacttcgacaattgcagttgttatttttgcaacgtttgtgggctttgct 
tatcttagagttgcaagaaaacagccggaacatggaagcttacttaaacgtggtatagaa 
. 25 gcaatctattctatcgttatggctatcgtaacttttgttttacgattaacgccttatggc 
attttagctattatggcttctactcttgcgacaagtgatttttctgcaatttggacgtta 
ggtaaattcttaattgcttcatacgcagctctaatcacaatgtatattatccatttaatt 
atactgagtgtcttaggtatcaatcccgttaaatacgtgaaaaagacaatagaagtacta 
atctttgcatttacttcacgttcaagtgcaggtgcattaccgttaaatgttcaaacgcaa 

30 acaaaacgtttaggtgtacctgagggaattgcaaacttctctgcaacttttggtttatcc 
atagggcaaaatggctgtgcaggaatctatcctgctatgctagcagttatggtggcacca 
gtagcaaatgtagaaattgacttccaatttgttgttacacttattgctgttgttattata 
agttcatttggcgttgcaggtgtaggtggcggggcaacattcgcatcaatactcgtatta 
tctacacttaatctaccagttgctctcgcaggggtactgatatctatcgaacctctcatc 

35 gatatgggtcgtacagcacttaacgttaatgactcaatgctagctggaacaggtaccgca 
cgcttaacgaa teat tgggacaaaaaaacatttgactcaaatgattacggcgatt tat ct 
gcaaattaa 

Sequence 906 

40 MEILFTVINVIVLLVFLIMLNIMARKHVSFPKRVFTALAIGIVFGIVLHLIYGAESKTLE 
QSTDWFSIVGDGYVALLQMIVMPLIFISIVAAFSKIQIGEKFAKIGSYIFMFLIGTVAIA 
AIVGIFYALIFGLDASSIDLGSAEHSRGTEISKQAKDLTANTLPQQILEVFPSNPFLDFT 
GQRTTSTIAVVIFATFVGFAYLRVARKQPEHGSLLKRGIEAIYSIVMAIVTFVLRLTPYG 
ILAIMASTLATSDFSAIWTLGKFLIASYAALITMYIIHLIILSVLGINPVKYVKKTIEVL 

45 I FA FTS RS S AG ALPLN VQTQT KRLG V PEG I AN FS AT FGL S I GQNGCAG I Y PAMLAVMVAP 
VANVEIDFQFVVTLIAWIISSFGVAGVGGGATFASILVLSTLNLPVALAGVLISIEPLI 
DMGRTALNVNDSMLAGTGTARLTNHWDKKTFDSNDYGDLSAN* 

Sequence 907 
50 Contig_0528_pos_1216_866, 

is similar to (with p-value 1.0e-22) 

>sp:sp| P39605I YWCG_BACSU HYPOTHETICAL 28.3 KD PROTEIN IN QOX 
D-VPR INTERGENIC REGION. >pir : pir | S39698 [ S39698 hypothetical 
protein - Bacillus subtilis >gp : gp | X73124 | BSGENR_4 4 B.subti 
55 lis genomic region (325 to 333). NID: g413923. >gp : gp | Z99123 
I BSUB0020_106 Bacillus subtilis complete genome (section 20 
of 21): from 3798401 to 4010550. NID: g2636240. 
gtgacccattttcttcatcagaaggttcacctactgccataccaaataacggaaacgtat 
aatcaggtaaatttaaaatttcacgcactcgcgcaacatcattacgcaatgaccctaaat 
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aaacaattccataccccatatcttcggcagcaactgccatgttttgcgcaactaatgcaa 
catctattgtacctactaatagtccttctgcagaaccatatgatgtctccatatttgacg 
ccgcaacttcatctactaaatgatgacgataataatctaaaacaaatacaaataaataac 
cattatctaaaacataaggctgacctgacacttcctttaaacgcgctttaa 

5 

Sequence 908 

VTHFLHQKVHLLPYQITETYNQVNLKFHALAQHHYAMTLNKQFHTPYLRQQLPCFAQLMQ 
HLLYLLIVLLQNHMMSPYLTPQLHLLNDDDNNLKQIQINNHYLKHKADLTLPLNAL* 

10 Sequence 909 

Con t ig_0 5 2 8_pos_0_5 5 6 , 

putative peptide of unknown function 

atgacactaacaaataaagaagttgctaaagttttatttaaagcttatagatataaaaaa 
cccatcgatttcattagtgagaactatcaattaaacgaagaagaagcatatcatgtacaa 

15 gaagaactaattgaccaattaactttcaaagaccgttcgactgttacagggtataaagtt 
agtatgactagcaaggcaacgcaagcaattgctaacactaacgaacctgcatatggaaca 
ctcttatctaaccaaattgttaatgatggtgcctcagtctctctttcagaattattttca 
ccattactagaaccagaaattatctttatagtgcaggaagacttaccttatgatgctgat 
ttagaaacaattagatatcatacccgtatcgcgccaggcattgaaattccagatgcaaga 

20 tataaaaattggtttccaaattttactttatcagatttaatatcagataataccgcaaca 
ggacttgtcgtagtaggtgaccctgtagacggacttgataacgatgcatttgctaatgta 
catttaaatttataTT 

Sequence 910 

25 MTLTNKEVAKVLFKAYRYKKPIDFISENYQLNEEEAYHVQEELIDQLTFKDRSTVTGYKV 
SMTSKATQAIANTNEPAYGTLLSNQIVNDGASVSLSELFSPLLEPEIIFIVQEDLPYDAD 
LETIRYHTRIAPGIEI PDARYKNWFPNFTLSDLISDNTATGLWVGDPVDGLDN DAFANV 
HLNLYX 

30 Sequence 911 

Contig_0530_pos_5055_4 64 5, 

is similar to (with p-value 6.0e-48) 

>pir :pir | S39743 I S39743 hypothetical protein - Bacillus subti 
lis 

35 atggaccctaaagtagctatgttaagcttttctacaaaaggttctgctaaatcggatgat 
gttactaaagtgcaagaagcattgaagttagctcaagaaaaagctgaagcagatcaatta 
gatcatgtagttattgatggagaattccaatttgacgctgctattgttcctagcgtagca 
gagaagaaagcacctggtgcaaaaattcaaggtgatgcaaatgtat t tgttttccctagt 
ctagaagcaggtaatattggttataagattgctcaacgtttaggtggatacgatgcagta 

40 ggaccagtcctacaaggattaaactctccagtcaatgatttatctcgtggttgctcaact 
gaagacgtttataacttatctattattacagctgctcaagctttacaataa 

Sequence 912 

MDPKVAMLSFSTKGSAKSDDVTKVQEALKLAQEKAEADQLDHWIDGEFQFDAAIVPSVA 
45 EKKAPGAKI QGDAN VFV FPSLEAGN I G YKI AQRLGG YDAVG PVLQGLNS PVNDLSRGCST 
EDVYNLSI ITAAQALQ+ 

Sequence 913 

Contig_0530_pos_4 573_3806, 
50 is similar to (with p-value 5.0e~66) 

>sp:splP3964 8| YWFL_BACSU HYPOTHETICAL 31.4 KD PROTEIN IN PTA 
3'REGION. >pir:pir|S39745|S39745 hypothetical protein - Bac 

illus subtilis >gp : gp | X731 24 | BSGENR_91 B.subtilis genomic re 

gion (325 to 333). NID: g413923. >gp : gp | Z99123 | BSUB0020_60 B 
55 acillus subtilis complete genome (section 20 of 21) : from 37 

98401 to 4010550. NID: g2636240. 

atgcaatcttttgcgtttgatgacactttttccgaaagcgttggtaaagatttatcttgt 
aatgtagtacgaacgtggatacatcaacacaccgtgattttgggcattcatgattcgcgt 
ttaccatttttaagtgatggtattcgttttcttacagatgaacaaggatataatgcaatt 
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gttaggaattctggtggcttgggtgtcgtattagatcaaggaattttaaacatatctttg 
atttttaaaggacaaaccgaaacgactattgatgaagcctttacagtgatgtatttattg 
attaataaaatgtttgaggatgaagatgttagtatcgatactaaagaaattgagcaatcg 
tattgcccaggaaaatttgatttaagtattaatgataagaaatttgccgggatttcgcag 
5 cgacgagtacgtggtggtatcgcagtgcaaatatacttatgtattgaaggttctggctca 
gaacgggcattaatgatgcaacagttttatcaacgtgcgcttaaaggggagactactaaa 
tttcactatccagacatagatccctcatgtatggcatctttagaaacccttttaaataga 
gaaattaaagtgcaagatgttatgtttttattattatatgcactaaaagatttaggggca 
aacttaaatatggatcctattacagaagacgagtggacacgttacgaagggtattatgat 
10 aagatgttagaacgcaatgcgaaaatgaatgaaaaattagatttttag 

Sequence 914 

MQSFAFDDTFSESVGKDLSCNVVRTWIHQHTVILGIHDSRLPFLSDGIRFLTDEQGYNAI 
VRNSGGLGVVLDQGILNISLIFKGQTETTIDEAFTVMYLLINKMFEDEDVSIDTKEIEQS 
15 YCPGKFDLS INDKKFAGI SQRRVRGGI AVQI YLCI EGSGSERALMMQQFYQRALKGETTK 
FHYPDIDPSCMASLETLLNREIKVQDVMFLLLYALKDLGANLNMDPITEDEWTRYEGYYD 
KMLERNAKMNEKLDF* 

Sequence 915 
20 Contig_0530_pos_3054_217 9, 

putative peptide of unknown function 

atgggtgaacacgcagttacatttggtcaaccggcaatcgcaattccatttaatgctgga 
aaaattaaagtcctcattgaaagtttagatgaaggtaattattcttctatcacaagtgac 
gtatatgacggaatgttatacgatgcccccgaacatctaaagtctatcattaatcgcttt 

25 gttgaaaaaagtggagtgaaagaaccactatcagtaaaaattcaaactaat ttgcctcca 
tcaagaggtttaggttcaagtgctgcagtagcagtagcgtttgtacgcgccagttatgat 
tttatggatcaacctttagatgacaaaacattgattaaagaagcaaattgggcggagcaa 
atcgcacatggtaagccaagcggtattgatacgcagacgattgtgtcaaataaacccgtc 
tggtttaaacaagggcaggccgaaaaattaaaatcactaaaattaaatggttatatggtt 

30 gtcattgatactggagtaaagggttctaccaaacaagcagtagaagatgttcatgtatta 
tgtgaatctgatgaatatatgaaatatatagagcacattggtacacttgttcacagtgct 
agcgaatcgattgaacagcatgatttccatcatttggctgacatatttaacgcatgtcaa 
gaagacttgagacatttaacagtaagtcacgataaaatagaaaaattacttcaaattggg 
aaagaacatggtgccattgctggtaaactaactggtggaggaagaggtggcagcatgctt 

35 cttcttgcggaaaatttaaaaactgcaaagactattgttgctgctgttgaaaaagctggc 
gcagcacatacatggattgaacatttaggaggttaa 

Sequence 916 

MGEHAVTFGQPAIAIPFNAGKIKVLIESLDEGNYSSITSDVYDGMLYDAPEHLKSI INRF 
40 VEKSGVKEPLSVKIQTNLPPSRGLGSSAAVAVAFVRASYDFMDQPLDDKTLIKEANWAEQ 
IAHGKPSGIDTQTIVSNKPVWFKQGQAEKLKSLKLNGYMVVIDTGVKGSTKQAVEDVHVL 
CESDEYMKYIEHIGTLVHSASESIEQHDFHHLADIFNACQEDLRHLTVSHDKIEKLLQIG 
KEHGAIAGKLTGGGRGGSMLLLAENLKTAKTIVAAVEKAGAAHTWIEHLGG* 

45 Sequence 917 

Contig_0530_pos_1178_102, 

putative peptide of unknown function 

atgattcaggtaaaagcccccggaaaactttatattgcaggcgagtatgcagtaaccgaa 
ccaggatataaatctattcttattgcagtaaatcgctttgtaacggcgacaattgaggcg 

50 tcaaataaagttgaaggtagtattcattccaaaacattacattatgaaccagttaaattt 
gaccgtaatgaagatagaattgaaatctcagatgttcaagctgctaagcaactgaaatat 
gttgtgacagctatagaagtgtttgaacagtatgtgcgcagttgcaatatgaatttaaag 
cactttcatttaaccattgatagtaacttagcagataactctggtcagaagtacggatta 
ggttcaagcgccgctgttttagtatctgttgttaaagctttgaatgaattctatggtttg 

55 gaattatcaaacctttatatttataaattagctgtaattgcaaatatgaaattacaaagt 
ttaagttcatgtggcgatattgcggttagtgtctacagtggttggcttgcatatagtacg 
tttgaccatgactgggtgaaacagcaaatggaagaaacatcggtgaatgatgttttggaa 
aaaaattggccaggcttacatatcgaacctttacaagctcccgaaaatatggaagtcctt 
attggatggactgggtctccagcttcttctccacacttagtgagtgaggtcaaacgt tta 



228 



WO 01/34809 



PCT/US00/30782 



aaatcagatccaagtttttatggtgattttttagatcaatctcatgcttgtgtagaaagt 
ttaatccaagcttttaaaactaataatatcaaaggtgttcaaaagatgatacgtataaac 
agacgtattattcaatctatggataacgaagcatcagttgaaattgaaacagataagcta 
aaaaaattatgtgatgtcggtgaaaagcacggtggcgcttctaaaacttcaggtgctggt 
5 ggtggcgattgtggcattactattatcaacaaggtaattgataaaaatattatttataac 
gaatggcaaatgaatgatatcaaaccattgaaatttaaaatttatcacgggcaataa 

Sequence 918 

MIQVKAPGKLYIAGEYAVTEPGYKSILIAVNRFVTATIEASNKVEGSIHSKTLHYEPVKF 
10 DRNEDRIEISDVQAAKQLKYVVTAIEVFEQYVRSCNMNLKHFHLTIDSNLADNSGQKYGL 
GSSAAVLVSVVKALNEFYGLELSNLYIYKLAVIANMKLQSLSSCGDIAVSVYSGWLAYST 
FDHDWVKQQMEETSVNDVLEKNWPGLHIEPLQAPENMEVLIGWTGSPASSPHLVSEVKRL 
KSDPSFYGDFLDQSHACVESLIQAFKTNNIKGVQKMIRINRRIIQSMDNEASVEIETDKL 
KKLCDVGEKHGGASKTSGAGGGDCGITIINKVIDKNIIYNEWQMNDIKPLKFKIYHGQ* 

15 

Sequence 919 

Cont i g_05 3 Iposl 6 1 9_3 223, 

is similar to (with p-value 5.0e-67) 

>sp:sp|P54417 |OPUD_BACSU GLYCINE BETAINE TRANSPORTER OPUD. > 

20 gp:gp|AF008220|AF008220_90 Bacillus subtilis rrnB-dnaB genom 
ic region. NID: g2293135. >gp:gp I Z99119 | BSUB0016_80 Bacillus 
subtilis complete genome (section 16 of 21): from 2997771 t 
o 3213410. NID: g2635411. >gp:gp| U50082 I BSU50082_1 Bacillus 
subtilis glycine betaine transporter OpuD (opuD) gene, compl 

25 ete cds. NID: gl524396. 

atggactggacgaccttcataggcgtagtcattgtgttactttttgctgttatacctatg 
atggtttttccgaaagcaagtgaaataatcattaccgatatcaatagtgccatttctaat 
tcaattggatcggtatatctctttatgggactggctatattttgt tttgt tttatacata 
gcatttggtaagtatgggaatgtcacgttaggaaaagcgactgacaaacctgaatttaat 

30 aatttcacatgggcagccatgttattctgtgccggtattggttcagatattttatattgg 
ggtgttattgagtgggcattttattatcaagtacctcctaacggtgcaaaatcaatgtcc 
gatcaagcacttcaatatgcaactcaatatggtatgtttcactggggacctatagcctgg 
gcaatatatgtgctaccagctttgccaatcggttatttagttttcgttaagaagaaaccc 
gtctataaaattagtcaagcttgtcgaccaattttaaaaggacatacggataaattatta 

35 ggaaaaatcgtagatattttatttattttcggtttgctcggtggtgctgcaacatcactc 
gctctaggcgtgccgatgatctcagctggtattgaacgattgactggtttagatggatct 
aatatgattttacgttcaatcatcttactaactattacagttattt tcgcaatcagttct 
tacacaggtttgaaaaaaggtattcaaaaattaagtgatgttaacgtttggttatcattt 
ttattattagcatttgtattcatcgtaggtccaactgtgtttattatggaaactacagtt 

40 acagggttcggtaatatgataaaagatttcttccatatggcgacatggatggaaccattt 
ggtggcataaaaggtcgtaaagaaacgaatttccctcaagattggacaatattctactgg 
tcatggtggctcgtttatgcaccgtttattggattgtttatcgcgcgtatctcaaaagga 
cgtacacttaaagaagttgtattaggaacaatatgctatggaacattaggttgtgtgtta 
tttttcggtatttttggtaactatgctgtatatctacaaattactgagcaatttaatgta 

45 ataagctatttaaacaattatggtacagaggcaacaatcatagaaataatgcatcaacta 
ccattctcgacaattactattatcttattcttaatatcagctttcttattcttagcaaca 
acattcgattctggttcatatattttagcagcagcgtcacagaaaaaagtgataggagaa 
ccgttacgtgctaatcgtttgttctgggcgtttgcgttatgtttactaccgttctcttta 
atgctagttggaggagaacgtgcattagaagtattgaaaacagcatcattacttgctagt 

50 gtacctttaattgttatatttacgctaatgatgatttcgttcttaattatactcggacga 
gatcgtatcaagttagaaagacgtgcagataagcataaagaaattgaaagacgttctcta 
agaatagttcaggtcaaagacaaacctgaagacgataacttataa 

Sequence 920 

55 MDWTTFIGVVIVLLFAVI PMMVFPKASEII ITDINSAISNSIGSVYLFMGLAIFC FVLYI 
AFGKYGNVTLGKATDKPEFNNFTWAAMLFCAGIGSDILYWGVIEWAFYYQVPPNGAKSMS 
DQALQYATQYGMFHWGPIAWAI YVLPALPIGYLVFVKKKPVYKISQACRPILKGHTDKLL 
GKIVDILFIFGLLGGAATSLALGVPMISAGIERLTGLDGSNMILRSIILLTITVIFAISS 
YTGLKKGIQKLSDVNVWLSFLLLAFVFIVGPTVFIMETTVTGFGNMIKDFFHMATWMEPF 
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GGIKGRKETNFPQDWTIFYWSWWLVYAPFIGLFIARISKGRTLKEVVLGTICYGTLGCVL 
FFGIFGNYAVYLQITEQFNVISYLNNYGTEATIIEIMHQLPFSTITIILFLISAFLFLAT 
TFDSGSYILAAASQKKVIGEPLRANRLFWAFALCLLPFSLMLVGGEE^ALEVLKTASLLAS 
VPLIVIFTLMMISFLIILGRDRIKLERRADKHKEIERRSLRIVQVKDKPEDDNL* 

5 

Sequence 921 

Contig_0531_pos_34 94_3811, 

is similar to (with p-value 3.0e-30) 

>gp:gp|Z99119|BSUB0O16_18O Bacillus subtilis complete genome 
10 (section 16 of 21): from 2997771 to 3213410. NID: g2635411. 
>gp:gp|U47861 |BSU47861_2 Bacillus subtilis gbsAB operon, gl 
ycine betaine aldehyde dehydrogenase GbsA r alcohol dehydroge 
nase GbsB genes, complete cds . NID: gl524391. 
gtgtcaatttcacgttcccattttttagtaaaaaagttgcggaagaaagtgaagaaatct 
15 ttctctgcaataaaatgttgctttcggctaccacgcgtgaattgttgtttgacgatatcg 
tattcttgtagtttttttacacctgtactcatactaggtttactcatttgcagttgttgt 
cgcatttcatcaagtgtcatacttccttcaaaaaccataatgccatacaagttacctaca 
ctacggttgataccatacaaatccatggtttcaccgattgagttgataactaaatct tta 
gcttcttcgatatattga 

20 

Sequence 922 

VSISRSHFLVKKLRKKVKKSFSAIKCCFRLPRVNCCLTISYSCSFFTPVLILGLLICSCC 
RISSSVILPSKTIMPYKLPTLRLIPYKSMVSPIELITKSLASSIY* 

25 Sequence 923 

Cont i g_0 5 3 l_pos_3 6 5 0_3 276, 

is similar to {with p-value 6.0e-26) 

>gp:gp|299119|BSUB0016_180 Bacillus subtilis complete genome 
(section 16 of 21): from 2997771 to 3213410. NID: g26354 11 . 

30 >gp:gp|U47861 |BSU47861_2 Bacillus subtilis gbsAB operon, gl 
ycine betaine aldehyde dehydrogenase GbsA, alcohol dehydroge 
nase GbsB genes, complete cds. NID: gl524391. 
atgagtacaggtgtaaaaaaactacaagaatacgatatcgtcaaacaacaattcacgcgt 
ggtagccgaaagcaacattttattgcagagaaagatttcttcactttcttccgcaacttt 

35 tttactaaaaaatgggaacgtgaaattgacactaatatggaagcaattgaagatgcagaa 
aatatcattcacccacttcttgaaaagaatgatatagatgaggaagttaaaaaacaagca 
ataaatgtaaaagcgcagttagaccactctaaaatatattacaagtggctcgctcaatta 
agtgaagctctagagtcaggagaaatttttaactatttcccaattccagatgaacaacat 
catcatgatcaataa 

40 

Sequence 924 

MSTGVKKLQEYDI VKQQFTRGSRKQHFIAEKDFFTFFRNFFTKKWEREI DTNMEAI EDAE 
NIIHPLLEKNDIDEEVKKQAINVKAQLDHSKIYYKWLAQLSEALESGEIFNYFPI PDEQH 
HHDQ* 

45 

Sequence 925 

Contig_0531_pos_18 83_1578, 

putative peptide of unknown function 

atggctgcccatgtgaaattattaaattcaggtttgtcagtcgcttttcctaacgtgaca 
50 ttcccatacttaccaaatgctatgtataaaacaaaacaaaatatagccagtcccataaag 
agatataccgatccaattgaattagaaatggcactattgatatcggtaatgattatttca 
cttgct ttcggaaaaaccatcat aggtataacagcaaaaagtaacacaatgactacgcct 
atgaaggtcgtccagtccataactttttcctttttcaaatgtgatccccctaatattaat 
ttatga 

55 

Sequence 926 

DWVHVKLLNSGLSVAFPNVTFPYLPNAMYKTKQNIASPIKRYTDPIELEMALLISVMIIS 
LAFGKTIIGITAKSNTMTTPMKVVQSITFSFFKCDPPNINL* 
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Sequence 927 

Contig_0532_pos_14 70_709, 

is similar to (with p-value 4.0e-40) 

>sp:sp|P54 717 |YFIA_BACSU HYPOTHETICAL 29.3 KD PROTEIN IN GLV 
5 G-GLVBC INTERGENIC REGION. >gp : gp I Z99108 I BSUB0005_88 Bacillu 
s subtilis complete genome (section 5 of 21.) : from 802821 to 
1011250. NID: g2633055. >gp: gp | D5054 3 | D50543_2 Bacillus sub 
tilis DNA for 76-degree region, complete cds . NID: gl486240. 
atgattttagatgaacgtgtaaactctaatttcgatcaattaaatgataatgatatacaa 

10 attgcacattatgttaatacacatatagatgtttgcaaaaatatgaaaatacaagattta 
gcctcacagacacatgcttcaaatgctacgattcatcgcttcactcgtaaactaggtttt 
gatggttatagtgactttaaatcctttttaaaatttgaagatagtaagaatcatcaactt 
ccttctgattctatggagcaatttaaacaagaaattgaaaatacattcaactatttagaa 
cgtattgattatcgtttattaactcacaaaatgcatcatgctacaacaatatacttatat 

15 ggtactggacgtgcacagatgaatgtcgctgaagaagcacaacgtatactgttgactatg 
cataaaaatattatattgttacatgatgttcatgaactaaagatggtgttaaacaagaca 
at tccagaagatttgtttt teat cat ttcactttctggcgaaacacatcaacttaaagaa 
gtcacacaattgcttcaactgagacaaaaatattttatttccgtaacaacaatgaaagac 
aatacattggcacaacaagctgattacaatgtctatgtttcaagcaataccttctattta 

20 aacgatggtactgattattccagttttattagctatcacattttctttgaaacactacta 
agaaaatataacgaatataaagagaatcatgaattaacatag 

Sequence 928 

MILDERVNSNFDQLNDNDIQIAHYVNTHI DVCKNMKIQDLASQTHASNATIHRFTRKLGF 
25 DGYSDFKSFLKFEDSKNHQLPSDSMEQFKQEIENTFNYLERIDYRLLTHKMHHATTIYLY 
GTGRAQMNVAEEAQRILLTMHKNI I LLHDVHELKMVLNKTI PEDLFFI I SLSGETHQLKE 
VTQLLQLRQKYFISVTTMKDNTLAQQADYNVYVSSNTFYLNDGTDYSSFISYHIFFETLL 
RKYNEYKENHELT* 

30 Sequence 929 

Contig_0533_pos_907_1239, 

putative peptide of unknown function 

gtgattaggaatagccctccgataacaaatgaaatagacagagggaaattcatagtgaca 
tgtaaagcgattgctcctaaaatccatgcaattcccgctcctattgtaataattcttata 
35 acagct ttagaaatgccttttaattctctaaaatctagat tact act accttcaaataat 
ataattgctacagcaagagatacaattgaactaaatgcctcaggtccaagtgcctctttt 
ggatttgctaatccaaaaataggtcctacaagtaaacctacgatggccatgacaacaatc 
gatggccattttattctactcgctaaccattga 

40 Sequence 930 

VIRNSPPITNEIDRGKFIVTCKAIAPKIHAIPAPIVI ILITALEMPFNSLKSRLLLPSNN 
IIATARDTIELNASGPSASFGFANPKIGPTSKPTMAMTTIDGHFILLANH* 

Sequence 931 
45 Cont ig_0 5 3 3 jdo s_1 6 4 9_2 698, 

is similar to (with p-value 1.0e-16) 

>sp:sp|P37520i YYAD_BACSU HYPOTHETICAL 37.7 KD PROTEIN IN RPS 
F-SPO0J INTERGENIC REGION. >pir : pir I SI 8084 | S18084 hypothetic 
al protein 9 - Bacillus subtilis >gp : gp | D26185 | BAC180K_52 B. 

50 subtilis DNA, 180 kilobase region of replication origin. NI 
D: g467326. >gp : gp | X62539 I BS0RIGS_14 B. subtilis genes rpmH, 
rnpA, 50kd, gidA and gidB. NID: g40020. >gp : gp I Z99124 | BSUB00 
21_199 Bacillus subtilis complete genome (section 21 of 21): 
from 3999281 to 4214814. NID: g2636442. 

55 atgagcaatgttaaagaatccatcatt'gttgccttcgcctttgtaggcgttgtagttgga 
gcaggatttgcgacaggtcaggaaatttttcaattcttcactagtcatggtatttacagt 
attggcggtatttttattactggacttatcataactcttggaggaatattcgtattaaat 
actggctttcgtcttagatctcaaaaccactctgaatctattcgttattatttacatcca 
acaatagctaaattatttgatattatacttacagtatttttattttctctagcaattatt 



231 



WO 01/34809 



PCT/US00/30782 



atgacagccggcggagcatcgactataaatgaaagttttggcttacctttttggttaagt 
tctctcattttagtgatacttattttgttaacattatttctaaagttcgatcgacttatc 
gctgttttaggaggggtaacaccatttcttgtggcagtcgtagtaatgattgcagtgtat 
tactttattaccggtgatttaaactttagtgacgtcagtcaatattcaaatcaaaataag 
5 tcgatttcacctggttggtggtttgacgcaattaattatgctagcttacaaattgctgct 
gcatttagctttttaactgtaatgggcggtaagctacgatatcaatcgtccacaatttat 
ggtggacttatcggtggtattattgtgactttactattacttttgattaattttggtctt 
gttacagaatttaatcaaattaaagaggtagcattaccatcattgctacttgctaagcag 
atttctccatccattggtattatcatgtctgtcattatggttttagtcatatacaataca 
10 gtagtaggtttaatgtacgcctttgcatcacgctttagtcgaccgtttacgaaacgctat 
tatattcttatagttatgatggcaataataacatttgcttgtacttttgtgggattcatt 
tctctcattggtaaagtgttccctattatgggactttttggttttatcttattgattcct 
gtgatatacaaaggaattttacgaaaataa 

15 Sequence 932 

MSNVKESI IVAFAFVGVVVGAG FATGQEI FQFFTSHGI YSIGGI FITGLI ITLGGI FVLN 
TGFRLRSQNHSESIRYYLHPTIAKLFDIILTVFLFSLAIIMTAGGASTINESFGLPFWLS 
SLILVILILLTLFLKFDRLIAVLGGVTPFLVAVVVMIAVYYFITGDLNFSDVSQYSNQNK 
SISPGWWFDAINYASLQIAAAFSFLTVMGGKLRYQSSTI YGGLIGGI IVTLLLLLINFGL 

20 VTEFNQIKEVALPSLLLAKQISPSIGIIMSVIMVLVI YNTVVGLMYAFASRFSRPFTKRY 
YILIVMMAIITFACTFVGFISLIGKVFPIMGLFGFILLIPVIYKGILRK* 

Sequence 933 

Contig_0533_pos_6099_6896, 

25 is similar to (with p-value 2.0e-38) 

>Sp:sp|P54721| YFIE_BACSU HYPOTHETICAL 31.5 KD PROTEIN IN GLV 
BC 3'REGION. >gp : gp | Z99108 | BSUB0005_93 Bacillus subtilis com 
plete genome (section 5 of 21): from 802821 to 1011250. NID: 
g2633055. >gp : gp | D5054 3 | D5054 3_7 Bacillus subtilis DNA for 

30 76-degree region, complete cds . NID: gl486240. 

atgacaaattttcattctatagatgctacacaagtaacaaacgtcactttaaatgttaaa 
gatttaaataaattaactgatttctattctaatgtattaggtttttctattcaaaaacaa 
acgaatcaacaaaccgtattcaacatcggaaatcttggttatactttaactttaaatgaa 
cttaacaacggtcgacaaccggaatttagagaagcagggttattccatgttgcttatctt 

35 ttaccgactcgtagcgatttagcagacttcctttatcatgctaacaatctcaacatcgca 
atgggtggtggagatcaccttgtcagtgaagcgctatatttcactgatcctgaaggcaat 
ggtattgaagtctatcatgatcgcccttcagaagactggttgtggcgagacggttttgtc 
aaaatggatacattggaagttgatgtcaatgatt taatgactcaacggtcaaatgaaggt 
tggcaaagttggccggaagaaggaaaaatcgggcatttacatctcaaaacacacaattta 

40 gaatctgcttatgaattttatgttgaaaagttagggttcgaacatatatctaatttccca 
caagcactatttatgtccactcaaaagtatcatcatcatatagctacaaatacttggcag 
tcaaataagattagaactcaaaatgaacaaacttatggtttatgtcactttgacatatat 
caacctaa tgcaaa t acta ctcatgttacctcacctgaaggctttgacattacaat teat 
ggtaacgaaacaaaataa 

45 

Sequence 934 

MTNFHSIDATQVTNVTLNVKDLNECLTDFYSNVLGFSIQKQTNQQTVFNIGNLGYTLTLNE 
LNNGRQPEFREAGLFHVAYLLPTRSDLADFLYHANNLNIAMGGGDHLVSEALYFTDPEGN 
GIEVYHDRPSEDWLWRDGFVKMDTLEVDVNDLMTQRSNEGWQSWPEEGKIGHLHLKTHNL 
50 ESAYEFYVEKLGFEHISNFPQALFMSTQKYHHHIATNTWQSNKIRTQNEQTYGLCHFDIY 
QPNANTTHVTSPEGFDITIHGNETK* 

Sequence 935 

Contig_053 3_pos_537 1_4 4 4 2, 
55 is similar to (with p-value 3.0e-24) 

>gp: gp I Z71552 | SPADCA_4 Streptococcus pneumoniae adcRCBA oper 
on. NID: g3758891. 

gtgtcaatcatcattttaatgttaagcggatgcagtagctttgatcatcgtaaacgcgaa 
agtattaatgacaagaataaaatgaaagtatacacgactgtatatgcatttcaaagtttg 
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acacaacagattggtggaaaatatgttgacgcgcaatcaatctatcctgctggtgctgat 
ttacactcatatgaaccaacacaaaaagatatgattgatattgccaaaagtgatctgttt 
gtctattcaagtcatcaattagatcctgtcgctgcaaagattacgaattcgatgaccaat 
aatagcatgaaattagcgcttgccgaaggactcaaacaaagtgattttattcactctaaa 
5 gaccatgatgaaaatcatgagcatcattcacatcatgaagaatcgaatcaagatcctcat 
gtttggttagatcctgttctaaatcaaaaattcgctttcatgattaaagagaaattaata 
gagaaagaccctaaacatcaagcttattacaataaaaattataaaatagtaaataaagat 
attgtgcatattgatcaacaactacaatcaataacgaagcattctaaaagagataaagtt 
gtgatatcacacgattcgcttggatatttagcgcatcgttatggttttaaacaacaaggt 
10 gttaaaggtatgaatgatgaagaacctagtcaaaaagagattttgaatatcgttaaagat 
atacagcattcacatgcgccttatgttttatatgaacaaaatattacctccaaaattaca 
gatgttattaagaaagaaacagatacgaaaccattaagttttcataatctagctgtattg 
actaaaaaggagcaaaatgatgattcaatttcataccaatcattaatgaaaaagaacatt 
tacgtattaaatcgcgcactcaataattaa 

15 

Sequence 936 

VSIIILMLSGCSSFDHRKRESINDKNKMKVYTTVYAFQSLTQQIGGKYVDAQSI YPAGAD 
LHSYEPTQKDMIDIAKSDLFVYSSHQLDPVAAKITNSMTNNSMKLALAEGLKQSDFIHSK 
DHDENHEHHSHHEESNQDPHVWLDPVLNQKFAFMIKEKLIEKDPKHQAYYNKNYKIVNKD 
20 IVHIDQQLQSITKHSKRDKVVISHDSLGYLAHRYGFKQQGVKGMNDEEPSQKEILNIVKD 
IQHSHAPYVLYEQNITSKITDVIKKETDTKPLSFHNLAVLTKKEQNDDSISYQSLMKKNI 
YVLNRALNN* 

Sequence 937 
25 Contig_0534_pos_5367_5684, 

is similar to (with p-value 5.0e-23) 

>gp:gp| U56999 |TPU56999_1 Treponema pallidum methyl-accepting 
chemotaxis protein (mcp-1) gene, complete cds, and potentia 
1 regulatory molecule (pfoS/R) gene, partial cds. NID: gl354 
30 774. 

atggctacagttgacaatggtgatatgattataaaactaaacaccatagagattaacata 
ctcataaacactggttgtaattcagtaaagctattaaccatgtcaccaattcccgttgta 
attaaccttacatatggtaaagtaaatacgccgattgtagctgctaaaccaccaacaact 
gtcggaaaaacaattaaagccatactaccaacacgttcttctattaataaaataagtaac 
35 acggcgatggatgcagttaacatcgtattaattaaatccccaattcccgcaattacccat 
gtaccttgtttaaattga 

Sequence 938 

MATVDNGDMI IKLNTIEINILINTGCNSVKLLTMSPIPVVINLTYGKVNTPIVAAKPPTT 
40 VGKTIKAILPTRSSINKISNTAMDAVNIVLIKSPIPAITHVPCLN* 

Sequence 939 
Cont ig_0 53 4_pos_8 94 0_0 , 
is similar to (with p-value 1.0e-30) 
45 >sp: sp | P254 68 | PYRD_SALTY DIHYDROOROTATE DEHYDROGENASE (EC 1. 
3.3.1) (DIHYDROOROTATE OXIDASE) (DHODEHASE) . >gp : gp I X55636 |S 
TPYRDDD_1 Salmonella typhimurium pyrD gene for dihydroorotat 
e dehydrogenase (EC 1.3.3.1.). NID: g854 623. 

atgtacaaattagttaagcctttattattcaaattagatcctgaacgagcacatggtttg 
50 accatcaatgcgttgaagtgtgttcaaaaatgttcacccattttacctatcgttaataag 
ttatttacttataacaatccaatattaacgcaacacattcacggtatttcttttgataat 
cctatcgggttagctgcaggttttgataaatcttgtgaagttccaaaagcacttgaaaac 
attggcttcggtgcaattgaactcggcggtataacacctaagcctcaaccaggtaatcca 
aaaccacgcatgtatcgtttactagaagatgatgcactcatcaatcgtatgggattcaat 
55 aataagggtatgaataaagcactaagtaatttacgtaatcattcatgctcaataccagta 
ggattaaatgttggtgtgaataaaacaacttcctatgaaaatcgctatcaagattaca 

Sequence 940 

MYKLVKPLLFKLDPERAHGLTINALKCVQKCSPILPIVNKLFTYNNPILTQHIHGISFDN 
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PIGLAAGFDKSCEVPKALENIGFGAIELGGITPKPQPGNPKPRMYRLLEDDALINRMGFN 
NKGMNKALSNLRNHSCSIPVGLNVGVNKTTSYENRYQDYX 

Sequence 941 
5 Contig_0534_pos_8 398_7859, 

putative peptide of unknown function 

atgcatagccaataccaacaagataataaatttggattccgtttaccacatgaaggtgca 
gatatttcctttgataattcatggactgagacatggaaagagatttttataaatcgtaga 
atggatcacttacaagatgagttattacgtgtaggattgtggaaacaagaagataaaaaa 

10 atgtatgaacgtgtaagaaaagttattgttgatgaactttcaaatcatactagtaagccc 
tctctgttacatggtgatttatggggaggtaactacatgttcttaacaaatggccaacct 
gctttatttgatcctgcaccactatatggagatagagaatttgacataggaatcactaca 
gtatttggtggatttacacaagagttctatgatgaatataatcaacagttaccactagcc 
aagggatcacaaaagcgtatagaattttatagattatatttacttatgatacatttactt 

15 aaatttggtggtatgtatgctgatagtgtacaacgctctatgaaaatcattttagaataa 



Sequence 942 

MHSQYQQDNKFGFRLPHEGADI SFDNSWTETWKEI FI NRRMDHLQDELLRVGLWKQEDKK 
20 MYERVRKVIVDELSNHTSKPSLLHGDLWGGNYMFLTNGQPALFDPAPLYGDREFDIGITT 
VFGGFTQEFYDEYNQQLPLAKGSQKRIEFYRLYLLMIHLLKFGGMYADSVQRSMKI ILE* 



Sequence 943 
25 Con t i g_0 5 3 4_pos_5 97 1_4 9 1 3 , 

is similar to (with p-value 6.0e-49) 

>gp:gp|U56999|TPU56999_l Treponema pallidum methyl-accepting 
chemotaxis protein (mcp-1) gene, complete cds, and potentia 
1 regulatory molecule (pfoS/R) gene, partial cds . NID: gl354 
30 774. 

atgtcgacgataaaaaatatcgatggaccaaaggattttgtttttagagtgttatcaggg 
gtagcaattggaatagtagccggactcgttccaaatgcaattttgggagaaatttttaaa 
tactttatgcaatatcatcctattttcaaaactttattaggggtcgttcaagccatccaa 
tttacagtgccagcgcttattggagcattgatagctatgaagttcaatatgacaccttta 

35 gcaatagctgtagtagcaagtgcctcatatgttggtagtggtgcagctcaatttaaacaa 
ggtacatgggtaattgcgggaattggggatttaattaatacgatgttaactgcatccatc 
gccgtgttacttattttattaatagaagaacgtgttggtagtatggctttaattgttttt 
ccgacagttgttggtggtttagcagctacaatcggcgtatttactttaccatatgtaagg 
ttaattacaacgggaattggtgacatggttaatagctttactgaattacaaccagtgttt 

40 atgagtatgttaatctctatggtgtttagttttataatcatatcaccattgtcaactgta 
gccatagctattgctattggattatcaggtattgctgcgggatctgcttcaataggtata 
gcagcgacagaagctgtattattgattggtaccagcaaagttaatcatgtaggtattcct 
ttatcaatatttttcggtggggtgaaaatgatgatgccaaatatggttaaataccctgtc 
attatgattccgattttcttgacagcggcaatatctggtattgcttcagggattattggt 

45 atttcaggaacaaaagaatcagcaggatttggttttatcggaatggttgggcctattaat 
gcctttaaatttatgcatgttgattctgcatggttaagtttattacttattgtcatcgcc 
ttttttgttgtgccgtttctagttgcatggattttagatttaatacttagaagattaatt 
catttgtatgagaatgatatttttaaatttatgggataa 

50 Sequence 94 4 

MSTIKNIDGPKDFVFRVLSGVAIGIVAGLVPNAILGEI FKYFMQYHPIFKTLLGVVQAIQ 
FTVPALIGALIAMKFNMTPLAIAVVASASYVGSGAAQFKQGTWVIAGIGDL1NTMLTASI 
AVLLILLIEERVGSMALIVFPTVVGGLAATIGVFTLPYVRLITTGIGDMVNSFTELQPVF 
MSMLISMVFSFI IISPLSTVAIAIAIGLSGIAAGSASIGIAATEAVLLIGTSKVNHVGIP 

55 LSI FFGGVKMMMPNMVKYPVIMI PIFLTAAISGIASGI IGI SGTKESAGFGFIGMVGPIN 
AFKFMHVDSAWLSLLLIVIAFFVVPFLVAWILDLILRRLIHLYENDIFKFMG + 

Sequence 945 

Contig_0534_pos_4 812_3388, 
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is similar to (with p-value 4.0e-37) 

>gp:gp|U12891|PAU12891_5 Pseudomoaas aeruginosa PAO substrai 
n OT684 pyoverdine gene transcriptional regulator PvdS (pvdS 
) gene, complete cds . NID: gl580798. 
5 atgtcagggaaactagaagaattacaattaaaagtagctcgattaagtcgacgtactcat 
gaattaggtattccaattatggtattatttgaggggattcctgcttcggggaagacacgt 
ttatcaaatgaattactattgcacctagatgccaaatattcgcgatttatagctactaaa 
tcgccagagtcaaacgatttacgttaccaatttttacaaaaatattggaatactttacca 
caaaagggcaatataaatatttattttagaagttggtattcacactttttagattataaa 

10 gaaaataaaattaagcatgatcaatataaaaattatgatgttttagtcaatcaaatttat 
cattttgaatcgatgttaaagaatgataactatgaaattataaaatttttcatagaaata 
aatgaagaaaaacgcaatgaacatattcaacagacaaaagataatccattaactagatgg 
aaagttcaagaatatgaaaatgttatacctcaagaaagttatctaaatcaaatgcatcaa 
ttcatcaacaaagataaagattggaaagtgatcgattacacagagcgcgagcatgctttt 

15 gaaaaaatgtacttacatttaatagatagacttgagcaagctataaaaaaagttgaacaa 
caaacaactaaagtcaacggtaagttcacatcaagctttacgacttctttatttaataat 
aatcttgagaaagtagacaaaaaaacgtataaaaatctcattgttgaattgcaacagaga 
atgagagaaatccaatttgctttatatgaaagaaagattccccttgttttggttttcgaa 
ggtatggatgctgctggtaaaggtggcaatat taaacgtattagagaaaaattagatcca 

20 acaggatatgaagtgaatggtattagtgcacctacggatgtcgaacttaagcatcattat 
ttgtggagatttgctaaaaagatgccaaaatcaggtcatatagaaatatttgatcggagt 
tggtatggtcgtgtactagttgaacgtgtagaaggttttgcaagccagaatgaatggcaa 
cgagcatctgatgaaatcaatcaatttgaaaagatgtggacagatgaaggtacaatcata 
ttaaaattcttcttatgtttagataaagatgagcagcttaagcgttttaaagaccgtgaa 

25 aataatcctgataaacaatggaagattactgaagaagattggcgtaatagagaaaaatgg 
gatgaatatttagaagcaagtcatgatatgattgaatctacaaacacttcatatgcccct 
tggtatattgttccggcagatcataaaaaaacgagtcggattgaagtacttaaaacaatt 
attagaaaatgtgaagaagtactatggggagttaagacgtattaa 

30 Sequence 94 6 

MSGKLEELQLKVARLSRRTHELGIPIMVLFEGIPASGKTRLSNELLLHLDAKYSRFIATK 
SPESNDLRYQFLQKYWNTLPQKGNINI YFRSWYSHFLDYKENKIKHDQYKNYDVLVNQI Y 
HFESMLKNDNYEIIKFFIEINEEKRNEHIQQTKDNPLTRWKVQEYENVI PQESYLNQMHQ 
FINKDKDWKVIDYTEREHAFEKMYLHLIDRLEQAIKKVEQQTTKVNGKFTSSFTTSLFNN 

35 NLEKVDKKTYKNLIVELQQRMREIQFALYERKIPLVLVFEGMDAAGKGGNIKRIREKLDP 
TGYEVNGISAPTDVELKHHYLWRFAKKMPKSGHIEI FDRSWYGRVLVERVEGFASQNEWQ 
RASDEINQFEKMWTDEGTIILKFFLCLDKDEQLKRFKDRENNPDKQWKITEEDWRNREKW 
DEYLEASHDMIESTNTSYAPWYIVPADHKKTSRIEVLKTIIRKCEEVLWGVKTY* 

40 Sequence 947 

Contig_0534_pos_2732_2025, 

is similar to (with p-value 9.0e-23) 

>gp:gp| U96108 |SCU96108_5 Staphylococcus carnosus (3R)-hydrox 
ymyristoyl acyl carrier protein dehydrase homolog (fabZ) gen 
45 e, partial cds, YwpF homolog, single-strand binding protein 
homolog (ssb), SceD precursor (sceD), SceA precursor (sceA) 
and SceE precursor (sceE) genes, complete cds, and TenA homo 
log (tenA) gene, partial cds. NID: g2735509. 

atgaaaaaaacagttatcgcttctacattagcagtatctttaggaattgcaggttacggt 
50 ttatcaggacatgaagcacacgcttcagaaactacaaacgttgataaagcacacttagta 
gatttagcacaacataatcctgaagaattaaatgctaaaccagttcaagctggtgcttac 
gatattcatttcgtagacaatggataccaatacaacttcacttcaaatggttctgaatgg 
tcatggagctacgctgtagctggttcagatgctgattacacagaatcatcatcaaaccaa 
gaagtaagtgcaaatacacaatctagtaacacaaatgtacaagctgtttcagctccaact 
55 tcttcagaaagtcgtagctacagcacatcaactact teat act cagcaccaagccataac 
tacagctctcacagtagttcagtaagattatcaaatggtaatactgctggttctgtaggt 
tcatatgctgctgctcaaatggctgcacgtactggtgtatctgcttcaacatgggaacac 
atcattgctagagaatcaaatggtcaattacatgcacgtaatgcttcaggtgctgctgga 
ttattccaaactatgccaggttggggttcaactggttcagtaaatgatcaaatcaatgcc 
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gcttataaagcatataaagcacaaggtttatctgcttggggtatgtaa 
Sequence 94 8 

MKKTVIASTLAVSLGIAGYGLSGHEAKASETTNVDKAHLVDLAQHNPEELNAKPVQAGAY 
5 DIHFVDNGYQYNFTSNGSEWSWSYAVAGSDADYTESSSNQEVSANTQSSNTNVQAVSAPT 
SSESRSYSTSTTSYSAPSHNYSSHSSSVRLSNGNTAGSVGSYAAAQMAARTGVSASTWEH 
IIARESNGQLHARNASGAAGLFQTMPGWGSTGSVNDQINAAYKAYKAQGLSAWGM* 

Sequence 94 9 
10 Contig_0534_posJD_1250, 

is similar to (with p-value 1.0e-51) 

>gp:gp|U93874 |BSU*93874_12 Bacillus subtilis cysteine synthas 
e (yrhA), cystathionine gamma-lyase (yrhB), YrhC (yrhC), Yrh 
D (yrhD), formate dehydrogenase chain A (yrhE), YrhF (yrhF), 

15 formate dehydrogenase (yrhG) , YrhH (yrhH), regulatory prote 
in (yrhl), cytochrome P450 102 (yrhJ), YrhK (yrhK), hypothet 
ical protein YrhL (yrhL) , putative anti-SigV factor (yrhM), 
RNA polymerase sigma factor SigV (sigV) and YrhO (yrhO) gene 
s, complete cds, and YrhP (yrhP) gene, partial cds . NID: gl9 

20 34604. >gp:gp|Z99117|BSUB0014_194 Bacillus subtilis complete 
genome (section 14 of 21): from 2599451 to 2812870. NID: g2 
634966. 

atgatgtttgttgtcacggttgttttaatatatacattactatttaaacccgaattaatt 
attagtataaaacatgatgctatagcagcacttttctatgtttctaactggtggtatatc 

25 atacaagacgtagactattttaatcaatttgctgtagcgccattaaagcatttgtggtca 
ttagctattgaagaacaattctatctattctttccatttatacttttaggcttattaaag 
tttttcaaaaagagaactacaatgattattctattaatcatctctttattatcattaact 
gcaatgataacgatacatatgtatacaggtaacaattctagagtttatttcgggactgac 
acacgtttacaaacgttattattaggatgcttactagcatttatttggccaccgttctct 

30 ttcagaaaggatatatctaaaggtgctaaagcaagtataagtgcaataggcatagtcgga 
atggcagtgctcatttatttgtttgtagtggttagtgatcaagataaatggatatatagt 
ggaggattttatgccatctcgttcttaacactgtttgtcattgcaagcgttgtgcatcca 
tcaagtgttttaaagaaaatactaagtttcaagttatttatttatataggtaagagatcg 
tatagtttatatttatggcactatcctatcattatttttatgaatagttatttcgtacaa 

35 ggtcagattccttggtttgtatatatttgtgaagtgatacttatgtttgtcatggctgaa 
gtatcttataaatttatcgaaacacctattagaaaaaatggatttaaagcattcacggtg 
ataccgaaaaatttaacaagattt tcaagaacgattattgtgttaatcttgcttgttcct 
tctgcattcatagtatttggtgcctatgatagtttgggtaaagagcatgataaacaacaa 
gctgcgaaacaaaaatcttttaaaacgaaccagaaagcaaaacctaaaaagccagatgaa 

40 aataatcaagataagtcttcacaacaacattttaatcctaaagaagcgtctccattattg 
ctgggagattcagtaatggtagatatcggtcaagtctttagtgaaaaagtaccaaatgct 
aatattgatggaaaagttggccgacagttaattgagggtaaagatttaatcaatcaaaag 
taccaagattatactaaaaaaggtcagagtgttgtgatagaacttggtac 

45 Sequence 950 

MMFVVT VVLI YTLLFKPELI I S I KH DAI AALFY VSNWW Y 1 1 QDVDY FNQFAVAPLKHLWS 
LAIEEQFYLFFPFILLGLLKFFKKRTTMI ILLIISLLSLTAMITIHMYTGNNSRVYFGTD 
TRLQTLLLGCLLAFIWPPFSFRKDISKGAKASISAIGIVGMAVLI YLFVVVSDQDKWI YS 
GGFYAISFLTLFVIASVVHPSSVLKKILS FKLFI YIGKRSYSLYLWHYPI 1 1 FMNSYFVQ 

50 GQI PWFVYICEVILMFVMAEVSYKFI ETPI RKNGFKAFTVI PKNLTRFSRTI IVLILLVP 
SAFI V FGAYDSLGKEH DKQQAAKQKS FKTNQKAKPKKPDENNQDKSSQQH FNPKEAS PLL 
LGDSVMVDIGQVFSEKVPNANIDGKVGRQLIEGKDLINQKYQDYTKKGQSVVIELGT 

Sequence 951 
55 Contig__0535_pos_3501_3932, 

putative peptide of unknown function 

atgaagaatatggtaatcttgaataagcaaaaaaggatgatcagaatgaaaaaagcaata 
tttagtattattatttctcttattttagttctaactgctactggatgtagtaatagttct 
aaagaaaaaccaattaaaaaaagtgcattagaaattaatcctacaagtaaagctgttaat 
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attacagtaaataaaaaagaaaataacaaacctgaaaaaattgggaaagtgtatcgatat 
aaaaataacaatgcaaaagaaattactaacgacggtattaaaaaagatactaaagataca 
ttgatttggaaaggtgtagcaaacaaatacgataatgtaaaagatttattaggagaaagt 
attctttatgaagttaaatataaaaatggggatataaaaaaattcgagagaaaaattaaa 
5 tatactgaataa 

Seguence 952 

MKNMVILNKQKRMIRMKKAI FSI I ISLILVLTATGCSNSSKEKPIKKSALEINPTSKAVN 
ITVNKKENNKPEKIGKVYRYKNNNAKEITNDGIKKDTKDTLIWKGVANKYDNVKDLLGES 
10 ILYEVKYKNGDI KKFERKI KYTE* 

Sequence 953 

Contig_0535_pos_5601_6350, 

is similar to (with p-value 2.0e-17) 

15 >gp:gp 1X13481 1 BTPGI2XX_4 Bacillus thuringiensis plasmid pGI2 
with transposon Tn4430. NID: g3171732. 
atgaaaaagttaatgatgagcttattatgttgcacaatatgtggaacagttttaactaca 
. ggaagtgtaaatgcacaaggtgatagctcaactaactctgagtcgttaaaggaacttcaa 
aatgaaggtattgtgtctaaatcaattactgaacaacaatggcaacaaatgaaagctcaa 

20 gagcgtaaagatgaagcagaatttgagaaaacagctgaagtacaatggcagaaacaacaa 
aaacaagatcgaatagatcgagaaaatcgttcaagaaaaaagaaatttcatttgaaaaaa 
ggtgat ttttt cat cacaaataacgtaagttcaaaagggtttacagg teat get gcaata 
tatactggaaaaggaaaagttaaagaagcgcctggatatggacaacctgtgagggtaaaa 
agttttagtgattggaaaaagagtactttgaaaaaaagaaaaggaagtcccaaacatcgt 

25 tatatcaaggtttatcgagcaccaaaaaaatatagaggtaaagctggaaactatgcgaaa 
tctcattttaacggtgtaccttacagtataacgacgaatccatattctaaaagcgttaca 
tactgttcgaaacttgtttggcaatcgtattattatggtgccggacgattatcagttctt 
ccggttgttacatctcaatttattattgagccatatagtctcaataaatatattccatca 
aaagcagttaggtcttataaaagaagctaa 

30 

Sequence 954 

MKKLMMSLLCCTICGTVLTTGSVNAQGDSSTNSESLKELQNEGIVSKSITEQQWQQMKAQ 
ERKDEAEFEKTAEVQWQKQQKQDRIDRENRSRKKKFHLKKGDFFITNNVSSKGFTGHAAI 
YTGKGKVKEAPGYGQPVRVKSFSDWKKSTLKKRKGSPKHRYIKVYRAPKKYRGKAGNYAK 
35 SHFNGVPYSITTNPYSKSVTYCSKLVWQSYYYGAGRLSVLPVVTSQFI IEPYSLNKYIPS 
KAVRSYKRS* 

Sequence 955 

Contig_0535_pos_7096_7773, 

40 putative peptide of unknown function 

gtgggcattttagt a tcggggtcagggatagcgagtgt acaaacaaat at aactcacgca 
aaagaaagtcacgattcaactcctcaaaatattaaattagtgggaacgtatgatacttct 
caagttgattccaaaacgatgaaacaatttaaagaaatagaaaaagaagataataatttc 
cacataactaaacatggaaataaagtcgttgtagaagacaaattacctaatccagagaat 

45 aaaacttcaagttattcagctgatggtagtgctgaaaataatacaaaagtaattaatttc 
tctgattttgttggtaatatggatgggaaagatgatggaaaaatatcggatgggataacc 
ttttatagtggtaaatcatataacggacaacacgatggtcaaaaagtaaaaaaagggact 
catgtacattgtaatagatttaacggaacaaaatctgatcatagatactggtcaaaaaaa 
catcctagagcttatgtagatttttataaaagtgattgctggtatcacgccaaagcttat 

50 aaatgttcttccttgggaaaaatgactaaatgcgatggtttgaatagtatttatagaaaa 
ggtgtcaaagattgctcatcatggaaaggtaaacccaaacataaaaactggcctaaaaca 
gcatggtatagaaattaa 

Sequence 956 

55 VGILVSGSGIASVQTNITHAKESHDSTPQNIKLVGTYDTSQVDSKTMKQFKEIEKEDNNF 
HITKHGNKVVVEDKLPNPENKTSSYSADGSAENNTKVINFSDFVGNMDGKDDGKISDGIT 
FYSGKSYNGQHDGQKVKKGTHVHCNRFNGTKSDHRYWSKKHPRAYVDFYKSDCWYHAKAY 
KCSSLGKMTKCDGLNSI YRKGVKDCSSWKGKPKHKNWPKTAWYRN* 
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Sequence 957 

Con t i g_0 5 3 5_pos_7 7 7 7_8 139, 

putative peptide of unknown function 

atgaataaaatcttaaaaatattaataacttctattattgttatcattattaccttaaca 
5 gtttggacttttagtgtgattacttatcagaaacacaagagtgagaaaatcatcaatcac 
gttatagaacgtaagggttgggataaaaaaataaaaaatgaaaaaatgagttttaatatt 
ataatgggatatgctgaaaaagatattgtttttaaagatcaaccatatagtgagtatgag 
tataacgtgacaccagcaccatggacagatgataaagaatataaggtgtggggggaaaca 
gatttacaaaagaaagactcctattataaatatcttttagaatcagaaccttacagaaaa 
10 taa 

Sequence 958 

MNKILKILITSIIVIIITLTVWTFSVITYQKHKSEKIINHVIERKGWDKKIKNEKMSFNI 
IMGYAEKDIVFKDQPYSEYEYNVTPAPWTDDKEYECVWGETDLQKKDSYYKYLLESEPYRK 
15 * 

Sequence 959 

Contig_0535_pos_814 5_0, 

putative peptide of unknown function 

20 atgtataacataataaaaaatagagatcattcatggacttctagtaagattcaaggtaga 
aatacagatggtggattagaatggtcaccagatcataaatcacttatttataaatatgat 
gcaacattaggtagacaaataaatactaatgacgtgttaactttacttcaagcaacagct 
aaaaactcaaatttacgttcaaatatcaatagtaatgaaaaacagttagcagaacgaggg 
tctaatgggtattctaaatctataattagagatgatggcgagaaatcttatttacttaac 

25 tcaaatcctattcaagtattagacttagtagaaccagataatggttacggtggacgtcaa 
gtcagtcattctaacgttatatataatgaaaaaaattcttctatcgtaaatggtcaagtt 
ccagaagctaatggggcatccgcttttaatattgataaagttgttaaagctaatgcggca 
aataatggtattatgggtgttatctataaggcacaattatacttagcaccatacagtcca 
aaaggttacattgaaaaattaggccaaaatttaagcaataccaataacgtgattaatgtt 

30 tattttgtgccttctgataaagtaaatcctagtataactgtaggtaattacgaccatcat 
acggtatattctggtgaaacatttaaaaatactatcaatgtaaatgataattatggatta 
aatacagtagcttctacaagtgatagtgcaattactatgaccagaaacaacaacgagtta 
gtaggtcaggctcctaatgttactaatagcacaaataaaattgtaaaagttaaagccaca 
gataaaagtggaaatgaaagtattgtttctttcacagtaaatataaaaccattaaacgag 

35 aaatatagaataacaacttcatcaagtaatcaaacaccagtgagaattagtaatattcaa 
aacaatgctaacctttcaattgaagatcaaaatagagtaaaatcttcactcagcatgact 
aaaattttaggtacaagaaattatgtcaatgagtcaaataatgacgttcgtagtcaggtt 
gtaagtaaagtaaatagaagtgggaacaatgctacagttaatgttacaactacattttct 
gatggtacaactaatacaataaccgttccagttaaacatgtgttattagaagttgtacct 

40 actactagaacaacagtaagaggacaacaatttccaaccggcaaaggaacttccccaaat 
gatttctttagtttaagaacgggaggtccagttgatgcgagaatagtttgggttaataat 
cagggacccgatataaatagtaatcaaattggtagagatttaacattacacgctgaaata 
ttctttga 

45 Sequence 960 

MYNIIKNRDHSWTSSKIQGRNTDGGLEWSPDHKSLIYKYDATLGRQINTNDVLTLLQATA 
KNSNLRSNINSNEKQLAERGSNGYSKSIIRDDGEKSYLLNSNPIQVLDLVEPDNGYGGRQ 
VSHSNVI YNEKNSSIVNGQVPEANGASAFNIDKWKANAANNGIMGVI YKAQLYLAPYSP 
KGYIEKLGQNLSNTNNVINVYFVPSDKVNPSITVGNYDHflTVYSGETFKNTINVNDNYGL 

50 NTVASTSDSAITMTRNNNELVGQAPNVTNSTNKIVKVKATDKSGNESI VSFTVNIKPLNE 
KYRITTSSSNQTPVRISNIQNNANLSIEDQNRVKSSLSMTKILGTRNYVNESNNDVRSQV 
VSKVNRSGNNATVNVTTTFSDGTTNTITVPVKHVLLEVVPTTRTTVRGQQFPTGKGTSPN 
DFFSLRTGGPVDARIVWVNNQGPDINSNQIGRDLTLHAEI FFX 

55 Sequence 961 

Con t ig_0 5 3 5_po s_5 4 8 3_5 088, 

putative peptide of unknown function 

atgcaaagaaaatacaaaattataggtattatttttatcgttcttcttattgttttaaca 
ttaatttttagtatagtgcatcattatgctaatgttcaaaaacatgaagaagctaaacta 
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agacaaaaagttcaccatattttcaaacaaaaaggttgggaagataaagttaaagaagaa 
aaaaatatatttacgttcaatactggggataatgatttacaagtcacttttaaagatgag 
ccttataatacgtatacatactctat tgatgaaaacaataaagtatatggacatgctgtt 
ttgaaagatgaatatgataaagattttgatagtaaaaaaaagtacaaagaatatttaaga 
5 aaaatgcattttgaagaaaaatatgatctgaaataa 

Sequence 962 

MQRKYKIIGI IFIVLLI VLTLIPSI VHHYANVQKHEEAKLRQKVHHIFKQKGWEDKVKEE 
KNIFTFNTGDNDLQVTFKDEPYNTYTYSIDENNKVYGHAVLKDEYDKDFDSKKKYKEYLR 
10 KMHFEEKYDLK* 

Sequence 963 

Contig_0535_pos_4 528_4172, 

putative peptide of unknown function 

15 atgataaagttactttcagtttttattgcaagtttattagttttaaccggattatctttt 
tcttcaattaatcaaggtaatactgcgagtgctaaaacaaaatataaaacaactataact 
tacaaaggtcaaaaatatgtatatgttggtcattataaacaccatttttctaaaaaagta 
gt taaattttctaaaggagt teat tcaggaaacaaattagtatctgct get agtaaactt 
agtaaaaatggttatgttaaagcatcctcggctatttataaagcctttgat tttggattg 

20 aaaaatgaattaaaaggtagctatttttacacagcagctaaaaaggtacaggcgtaa 

Sequence 964 

MIKLLSVFIASLLVLTGLSFSSINQGNTASAKTKYKTTITYKGQKYVYVGH YKHHFSKKV 
VKFSKGVHSGNKLVSAASKLSKNGYVKASSAI YKAFDFGLKNELKGSYFYTAAKKVQA* 

25 

Sequence 965 

Con t i g_0 5 3 5_pos_3 2 7 7_2 618, 

putative peptide of unknown function 

atgtcacaatcgacacaccaatcaactgatcataaaagacaatctcaagataagaaccaa 
30 acacatgcctatgcaaaagtttggttatattttatgtattattggatgatttttgggatt 
ggttgttattttggacaatatcttcctatgtcatggagaaaaccattgtctctaggattg 
ttaattctgatattagcaacgttgtttattaaacgtgcgcgaaaatatggattagtgatc 
tctcatatatacgctattattgtaggtttgctttcgtatgctctttttacgacttactta 
caaaatttaggagcagaagtattctataaaaatattattcttgctattggtgcatttata 
35 gcttttggtattataggttactttttaataaaagacgcttcgagtatgggaaaatatttg 
tttgtgacattaattgcgctaattatagctgggattatagggatatttattaataatcca 
atttttcatactgtcattacaatagtgagcttattattgtttctcctttatactttatac 
gattttaatagaatgaaaagaggtcaattctcaccaagagaaatgggatttaatttgttt 
atcaacttattaaatatcatcgaagatatacttagtttagcaaatcgctttaaaaactaa 

40 

Sequence 966 

MSQSTHQSTDHKRQSQDKNQTHAYAKVWLYFMYYWMIFGIGCYFGQYLPMSWRKPLSLGL 
LILILATLFIKRARKYGLVISHI YAIIVGLLSYALFTTYLQNLGAEVFYKNIILAIGAFI 
45 AFGI IGYFLIKDASSMGKYLFVTLIALI IAGI IGI FINNPI FHTVITIVSLLLFLLYTLY 
DFN RMKRGQFSPREMGFNLFINLLNIIEDILSLANRFKN* 

Sequence 967 

Con t i g_0 5 3 5_pos_2 3 5 5_1 8 1 , 

50 putative peptide of unknown function 

atggcaaattcttgtttgcatatactttcaaaaaaagaatatacggcaacacgatgtcaa 
gacggcattttattattttggcctatcgaagggagtatgcactttcaacaatttatgaaa 
gaaaggatactctcagatgagttatatattgtgaataatatggatgtgtttagtatcagt 
gacaatggcatcacactagaagtatatatttctagtgattggtttacagagttaggctat 

55 tctttttttaattaccattatatttcggatttaatacaatctaagaaagaaattaaagaa 
ctagttgctcaacttacgttgaattttttagataatgatgtggataaagagcaagatatt 
atcaataaaattgttcatattcttgctaatgaggctattattgacaaaaaaattgctgaa 
gaccaatatatgtatgattattatggtgagttaaaggatgaattgaattatatatataat 
cacatcgaagaaagacttactctaaaagatatttccaataaattatatgtttctaagtcc 
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aacctttctacgcagtttcatttgttattaggtatgggatttaaaaaatatatagataca 
cttaaaattagtaaatctatagagatgctacttacgacaactaaaacgataagccaaatt 
agtgaaacgttaggatttagtaatgtatctacatattctagacaattcaaaaattattta 
agtgtaacgcccaatgcatatcgtgcaatgaaaaaatatgataagtacaatggatgttct 
5 gatgatgatgtttcagaacacttaaaatcatgtgtacaatcattaatatgttctaaaatg 
ccaacgaatgagttagataattatgatgaaattgttattgaccaatatccaatttctaat 
gtttcaacgttttattctgtcgttcaaattaattcaattgatgaaattaaaatgttgttt 
ttgcaaggtattcataaaaaaatagggtatgaaggttcaaatattattttttgtattatg 
cctaacttatgccaatataagaatttgttctctcaagaagagatgaatgatatcattaaa 

10 attattattgaatatcgcttgcacgtcgcatttagcatcgataaaatcgagcaaatatat 
gaacttaatcaactttttacatatcaatatgaaaacttaaaaattatgaataaatgttct 
gtttcagattacaatgtgcaatttatatttaatttgaacgaaaaaagtattcgagaaatt 
tatcgtaatatcttgaagatacaaaacatcgaattggaatataaaataggtttagatatt 
agttgcatgtttaatgacactgcacaatttaaatcattagcttcgcaaataaagcgttta 

15 aaatttgactatctttatattgataatgctagattaaagtcaccctatttacttgataat 
gaggaaggtttattactcaaaaatattctacaatttaagcatttaatagatgatttaaag 
cagtttgattttagtagtgaaaatttaatttttctaaatttatataatcatcaattactg 
aataataatgaaattgatttaagtaatagcgctccattactatttaaaacgatttcaaaa 
ctaaaaaaacattttaaaggctacggattaaatgtattttcaaatcctaaagtctttaac 

20 gctgtacatttatttgatgagaatgggtttaaaacaacgtttggactgatttttaatcat 
ttgagttggatgactaatcaaaaccaaattgaacaacgattctataatattattgaaaat 
gctgatcaatattatctctacttatatgattggcgtgtgattgaaagtgaatctaatgag 
agcgactttaaagacgttgatatatggattaactttgaagatgaagcgttaatagatgaa 
tatatttgtgtgattgctaaagttgatgatgaaggtggcaatattaatcatatgatttct 

25 caaaacttacgtcacaaatatgtttggtctacaccgttcttgatgagagt tgaggagaac 
tttagaccatacatgcacattatggaacatgactttaaaaagggcccattgaaaatcaga 
atgaaatataatgcagtatatgtagttgaaatatataaaaaagataaaataaataaaagg 
cgtagcacaacttaa 

30 Sequence 968 

MANSCLHILSKKEYTATRCQDGILLFWPIEGSMHFQQFMKERILSDELYIVNNMDVFSIS 
DNGITLEVYISSDWFTELGYSFFNYHYISDLIQSKKEIKELVAQLTLNFLDNDVDKEQDI 
INKIVHILANEAIIDKKIAEDQYMYDYYGELKDELNYI YNHIEERLTLKDISNKLYVSKS 
NLSTQFHLLLGMGFKKYIDTLKISKSIEMLLTTTKTISQISETLGFSNVSTYSRQFKNYL 

35 SVTPNAYRAMKKYDKYNGCSDDDVSEHLKSCVQSLICSKMPTNELDNYDEIVIDQYPISN 
VSTFYSVVQINSIDEIKMLFLQGIHKKIGYEGSNIIFCIMPNLCQYKNLFSQEEMNDIIK 
IIIEYRLHVAFSIDKIEQI YELNQLFTYQYENLKIMNKCSVSDYNVQFI FNLNEKSIREI 
YRNILKIQNIELEYKIGLDISCMFNDTAQFKSLASQIKRLKFDYLYIDNARLKSPYLLDN 
EEGLLLKNILQFKHLIDDLKQFDFSSENLI FLNLYNHQLLNNNEIDLSNSAPLLFKTISK 

40 LKKHFKGYGLNVFSNPKVFNAVHLFDENGFKTTFGLIFNHLSWMTNQNQIEQRFYNIIEN 
ADQYYLYLYDWRVIESESNESDFKDVDIWINFEDEALIDEYICVIAKVDDEGGNINHMIS 
QNLRHKYVWSTPFLMRVEENFRPYMHIMEHDFKKGPLKIRMKYNAVYVVEI YKKDKINKR 
RSTT* 

45 Sequence 969 

Cont ig_05 3 6_pos_l 7 1 5_2 4 01, 

is similar to (with p-value 7.0e-31) 

>sp:sp|P39787|DNAD_BACSU DNA REPLICATION PROTEIN DNAD. >gp:g 
p|L4 7709|BACYPIA_25 Bacillus subtilis (clone YAC15-6B) ypiAB 

50 F genes, qcrABC genes, ypj ABCDEFGHI genes, birA gene, panBCD 
genes, dinG gene, ypmB gene, aspB gene, asnS gene, dnaD gen 
e f nth gene and ypoC gene, complete cds's. NID: g!14 6223. >g 
p:gp|U11289|BSU11289_2 Bacillus subtilis 168 asparaginyl-tRN 
A synthetase (asnS) and endonuclease III (jooB) genes, parti 

55 al cds and DnaD protein (dnaD) and (jooC) genes, complete c 
ds. NID: g5330"96. >gp: gp | Z99115 | BSUB0012_175 Bacillus subtil 
is complete genome (section 12 of 21): from 2195541 to 24092 
20. NID: g2634478. 

atggatctaattcaattaaaaacaagacctgttgttataagacgagaattgtttgatcat 
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tattcagagttaggtttggatgaacaagatttagttattttgataaaacttttatatgca 
tctgaaacttctaataagcaaccttctattgaatttcttcaaaaaggatcaactatggaa 
cctcgtcaaattacttccgtaatacaaaacttaattcaaagagaattattagaactcaat 
gttagtaaagacgaagaaggtaaattcactgaatacatgaatttggatcccttctatcac 
5 aaattaaatcaattattaaaacatcaatacttaaaacatgaggaacaagataaaaaagag 
cagtttaagcaattgtttcagatagttgagcaatcgttcggcagaccactatcgccgtat 
gaaa t tgaaaca t taaatcagtggattgatgtcgat caeca tgacttatcagtt at acaa 
gccgctcttgatgaggcacttagccaaaataaacttagttttaaatatattgatcgtatt 
ttattaaattggaaaaagaataatgtgaaaacagttgacgattcaaagaaaataagagaa 
10 cagtttaacaaaccaaaaatgaaacatgttgttaaaaaggtgcctaaatttgactggt tg 
aatggagagaatcctaatgataagtaa 

Sequence 970 

MDLIQLKTRPVVIRRELFDHYSELGLDEQDLVILIKLLYASETSNKQPSIEFLQKGSTME 
15 PRQITSVIQNLIQRELLELNVSKDEEGKFTEYMNLDPFYHKLNQLLKHQYLKHEEQDKKE 
QFKQLFQIVEQSFGRPLSPYEIETLNQWIDVDHHDLSVIQAALDEALSQNKLSFKYIDRI 
LLNWKKNNVKTVDDSKKIREQFNKPKMKHVVKKVPKFDWLNGENPNDK* 

Sequence 971 
20 Contig_0536_pos_2418_3050, 

is similar to (with p-value 8.0e-76) 

>sp:sp| P39788 | END3_BACSU PROBABLE ENDONUCLEASE III (EC 4.2.9 
9.18) (DNA- (APURINIC OR APYRIMIDINIC SITE) LYASE). >gp:gp|L4 
7709|BACYPIA_26 Bacillus subtilis (clone YAC15-6B) ypiABF ge 

25 nes, qcrABC genes, ypjABCDEFGHI genes, birA gene, panBCD gen 
es, dinG gene, ypmB gene, aspB gene, asnS gene, dnaD gene, n 
th gene and ypoC gene, complete cds 1 s . NID: gll4 6223. >gp:gp 
|U11289|BSU11289_3 Bacillus subtilis 168 asparaginyl-tRNA sy 
nthetase (asnS) and endonuclease III (jooB) genes, partial c 

30 ds and DnaD protein (dnaD) and (jooC) genes, complete cds. 

NID: g533096. >gp : gp I Z991 15 | BSUB0012_174 Bacillus subtilis c 
omplete genome (section 12 of 21}: from 2195541 to 2409220. 
NID: g2634478. 

atgattgacgttatagcagatatgtttcctaatgcagaatgcgaattaaaccatagaaat 
35 gcattcgatcttacaatagctgtattattatcagcacagtgtactgataatctagtcaat 
cgtgtcactcaatcattatttagaaaatatcgaacacctgaagattatttaaatgtgagt 
gatgaagaattacaaaatgatatacgctctattggattatatcgcaataaagccaaaaat 
ataaaaaaattatgccactctttaattgaacaatttaatggtcaaatcccacaaacacat 
aaagaattagagagtctagctggagtggggcgtaaaacagcaaatgttgtaatgagtgtc 
40 gcatttggagaaccttctttagctgtcgatactcatgttgagagagtttctaaacgtttg 
ggaattaatcgttggaaagatagtgtaagacaagtagaagatcgattatgtgatattatc 
ccaagagatagatggaataaaagccatcatcaattaatattttttgggagatatcattgt 
cttgctagaaaacctaaatgtgagatatgtccgctgttaaatgattgtagagaaggacaa 
aaacgacataaagcaaagataaaggaggcgtga 

45 

Sequence 972 

MIDVIADMFPNAECELNHRNAFDLTIAVLLSAQCTDNLVNRVTQSLFRKYRTPEDYLNVS 
DEELQNDIRSIGLYRNKAKNIKKLCHSLIEQFNGQIPQTHKELESLAGVGRKTANVVMSV 
AFGEPSLAVDTHVERVSKRLGINRWKDSVRQVEDRLCDII PRDRWNKSHHQLI FFGRYHC 
50 LARKPKCEICPLLNDCREGQKRHKAKIKEA* 

Sequence 973 

Con t i g_0 5 3 6_pos_3 0 5 5_3 387, 
putative peptide of unknown function 
55 atgattgaaaaacaggatttcaatcatatagaggaccaacttgatcaactagcaagtaat 
aaacaactcaaaacaccagaagctagggaacttttagatagttatttcgatttaattatt 
aattattttaaacaaataaataacatagatgaaattcattttaatcaactcgatacatat 
ccagtagttccaatgaattt tgatgaacgctatcattatatggttgcacgtaaacaccat 
tttatgggctatcgtcaaatgaaaacattgaaatcagaattaataaaaatgaatgcatct 
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tatctaattagaaagcaacgtcaacaaaaataa 
Sequence 974 

MI EKQDFNH I EDQLDQLASNKQLKT PEARELLDS Y FDLI I N YFKQI NN I DEI H FNQLDT Y 
5 PWPMNFDERYHYMVARKHHFMGYRQMKTLKSELIKMNASYLIRKQRQQK* 

Sequence 975 

Contig_054 0j?os_1272_2228, 

is similar to (with p-value 6.0e-27) 

10 >gp:gp|AF076683jAF076683_2 Staphylococcus aureus oligopeptid 
e transporter putative substrate binding domain (opp-lA) , ol 
igopeptide transporter putative membrane permease domain (op 
p-lB) , oligopeptide transporter putative membrane permease d 
omain (opp-lC), oligopeptide transporter putative ATPase dom 

15 ain (opp-lD) , and oligopeptide transporter putative ATPase d 
omain (opp-lF) genes, complete cds; and unknown gene. NID: g 
3800817. 

atgctcaaacgtacaattaaattcatactttatttaatcgtaagttcgtttattatcttc 
attttagttgagaagacatctggtaatccagcgattctgtatctacaacgtcatggttat 

20 acgtcgattacgcaagacaatattgaagcggcacaacatcaacttggcttaggacaacat 
gtgttactaagatatatcgattgggttggacatgcactcacgggcaacttaggatacggc 
tttagtacgaacgaagcagttaccgctatgataatggaagccatcgtgccgacgcttgtg 
ctaatcattgtctctagttgtatcatgttgccatttggctatattgttggttacttcgtt 
gggacgcgtccgcatacacgttacgctaatggaattcgtggattcgcccaagtgatgacc 

25 tcaatgccagaa tact ggttagctattttat tea tttattatttaggcgtacgttggcaa 
ttgttaccatttgtaggtagtgattcatggcaacactttgtgctgccaatcttcacaatt 
gttgttatagaagggtgtcatatcttattgatgacagcacatctgattacacaaacgtta 
gatcaagatgcgtatcaactggcgcagttaagacatttttcgttaaaagcgcgtatcatc 
gtacaaattaaagagatatttgcaccactaatgacgatttcaattaacagtatcattcat 

30 ttaattggaaaagccgtaatactagaagtcatcttcagcatgtctggtataggtaaattg 
ttgattaatgctattaaccaacgagattatccactgattcagggcattgtcatctttatc 
attgtctttattatgctaatgaattatttaggcgatgtgattattttgaagaatgaacct 
agacttcgacgacgtcatacccagcagtcaggcaatgagaaaagaggtacgatgtga 

35 Sequence 97 6 

MLKRTIKFILYLIVSSFIIFILVEKTSGNPAILYLQRHGYTSITQDNIEAAQHQLGLGQH 
VLLRYIDWVGHALTGNLGYGFSTNEAVTAMIMEAIVPTLVLIIVSSCIMLPFGYI VGYFV 
GTR PHTRYANG I RGFAQVMT SMPE YWLAI L FI YYLGVRWQLLPFVGS DSWQH FVLPI FT I 
WIEGCHILLMTAHLITQTLDQDAYQLAQLRHFSLECARIIVQIKEIFAPLMTISINSIIH 

40 LIGKAVILEVIFSMSGIGKLLINAINQRDYPLIQGI VI FI IVFIMLMNYLGDVIILKNEP 
RLRRRHTQQSGNEKRGTM* 

Sequence 977 

Con t i g_0 5 4 0_po s_2 2 7 6_2 995, 
45 is similar to (with p-value 5.0e-32) 

>gp:gp I U64514 | BFU64514_3 Bacillus firmus dppABC operon, dipe 
ptide transporter protein dppA gene, partial cds, and dipept 
ide transporter proteins dppB and dppC genes, complete cds. 
NID: gl813494. 

50 atggttgtattaattacgtatggtttaatgcaagacacgcaacatttgaacccacttgag 
tcacctaatggacaacattggttgggtaccgatcaattaggcagagacttcttagtaaga 
ctgattgtcggtagtcttgtcacattgagtttaacaggcatagtgattctattaagcgtt 
tgtatgggacttatctttggcttaattgcaggcatagaaagacgatggttagatcaaatc 
atcatgtttgttgccgatatgttgctggctattccgtcatttattatcgcattagtcatc 

55 ttaagtttagtaagtaactccatgataggtttgatacttgctttaacgattggatggata 
ggacgttatttacgttacttcagaaatttaacgcgagatattcaaaaacgtccatttgtt 
caatatgcacgattgagtgggaactcaacattcaaaacgacagtaacacatgtgattcca 
catttattaagtagtatattcgctttggtaacggctgactttggcaaaatgatgctcagc 
atatctggacttgcttttctaggactaggtattaaaccgccgacgcctgagttaggaaca 
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attctttttgatgggaaaagttatttcaacggcgcaccgtggctcttcttcttccctggt 
gtattgttaggaggtttcgccttattatgtcaaattatcaacaaaaaaataacgcagtaa 



5 Sequence 978 

MVVLITYGLMQDTQHLNPLESPNGQHWLGTDQLGRDFLVRLIVGSLVTLSLTGIVILLSV 
CMGLI FGLIAGIERRWLDQI IMFVADMLLAI PSFI IALVILSLVSNSMIGLILALTIGWI 
GRYLRYFRNLTRDIQKRPFVQYARLSGNSTFKTTVTHVIPHLLSSIFALVTADFGKMMLS 
I SGLAFLGLG I KPPTPELGT I LFDGKS YFNGAPWLFFFPGVLLGGFALLCQI INKKI TQ* 

10 

Sequence 979 

Contig_054 0_pos_314 8_3729, 

is similar to (with p-value 2.0e-19) 

15 >sp:sp|P4 509S|DPPD_HAEIN DI PEPTIDE TRANSPORT ATP-BINDING PRO 
TEIN DPPD. >pir:pir ! F64188 | F64188 dipeptide transport ATP-bi 
nding protein (dppD) homolog - Haemophilus influenzae (strai 
n Rd KW20) >gp : gp | U17295 | HIU172 95_3 Haemophilus influenzae d 
ppB, dppC, dppD, dppF, isn, artP, artl/J, artQ, and artM gen 

20 es, complete cds, and opa gene, partial cds . NID: g972894. > 
gp:gp|U32798 |U32798_3 Haemophilus influenzae Rd section 113 
of 163 of the complete genome. NID: gl574110. 
atgaaacaatcacaat tatgt tatcaaggagatattgacatcgatttaactcaaacagat 
gcagtgtttcaagatgttcaaagtaatatgtttcaaaatataacattagctaagcatttc 

25 caatacatttatgaagccaatcgcacacatctcactaaacagcgtattaaggaagatgtc 
ttacagatgatgcaattacttggtttaagacaaggggaacaattgcttgagcgttatccc 
ttcgaacttagtggaggtatggcacaacgtgtcgcctttataatgtcat taattagacgt 
ccgaactacttatttttagatgaaccaacgagtgcacttgatcaagaaaatattaaaaag 
tttatgcattaccttcttagggcacaggagcgctaccaaatgaccattgtttttatcaca 

30 catgatattaacttagtgaaagattgtgccacacatattagtattatgcagcaaggtaaa 
ttgatagaaaatggtgaggccgcgtcgatcttaactaagccgacacataattacacgaaa 
aaattaattacgattgcacatcggagacaaccttatgcttaa 

Sequence 980 

35 MKQSQLCYQGDIDIDLTQTDAVFQDVQSNMFQNITLAKHFQYI YEANRTHLTKQRIKEDV 
LQMMQLLGLRQGEQLLERYPFELSGGMAQRVAFIMSLIRRPNYLFLDEPTSALDQENIKK 
FMHYLLRAQERYQMTIVFITHDINLVKDCATHISIMQQGKLIENGEAASILTKPTHNYTK 
KLITIAHRRQPYA* 

40 Sequence 981 

Con t i g_05 4 0_pos_5 0 1 8_5 803, 

putative peptide of unknown function 

atgcaacattcaagcaaaataatagtatttgtaagtttcttaattttaacgatttttatt 
ggaggatgtggttttataaataaagaagatagcaaagaaacggaaatcaaacaaaacttt 

45 aataaaatgttagacgtgtatccaactaaaaatctagaagacttttatgataaagagggc 
tatcgtgatgaagagtttgataaagatgacaaaggaacatggattattaggtctgaaatg 
acaaaacagccaaaaggtaaaattatgacctcaagaggtatggttctctatatcaatcgc 
aacactagaacagccaaagggtattttttaatagataagataaaagatgatagtaatggt 
agaccgatagagaatgaaaagaaataccctgtaaaaatgaaccataataagatctttcca 

50 acaaagccaatatctgatgataagttaaaaaaagaaattgaaaacttcaaattttttgtg 
caatatggaaattttaaaaacttaaaggattataaaaacggggatattttatacaatcct 
aatgttcctagttattctgcgaaatatcaattgagtaataatgaatataacgtacaacaa 
ttaagaaaaagatatgacatcccaactaaaaaagcacctaaactattgttaaaaggggat 
ggcgacttaaaaggatcatccgtaggtcatagagacctagaatttacctttgtagagaat 

55 aagaaagaaaacatcttttttacggatagtattaattttaaaccgactgagcgtgatgaa 
tcatga 

Sequence 982 

MQHSSKIIVFVSFLILTIFIGGCGFINKEDSKETEIKQNFNKMLDVYPTKNLEDFYDKEG 
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YRDEEFDKDDKGTWI I RSEMTKQPKGKIMTSRGMVLYINRNTRTAKGYFLI DKI KDDSNG 
RPIENEKKYPVKMNHNKIFPTKPISDDKLKKEIENFKFFVQYGNFKNLKDYKNGDILYNP 
NVPSYSAKYQLSNNEYNVQQLRKRYDIPTKKAPKLLLKGDGDLKGSSVGHRDLEFTFVEN 
KKENI FFTDSINFKPTERDES* 

5 

Sequence 983 

Con t ig J3 5 4 0 jpos_60 7 5_7 0 4 3 , 

is similar to (with p-value 2.0e-20) 

>gp:gp|AJ222587 |BS16829KB_25 Bacillus subtilis 29kB DNA frag 

10 ment from ykwC gene to cselS gene. NID: g2632216. >gp:gp|Z99 
111 I BSUB0008_93 Bacillus subtilis complete genome (section 8 

of 21): from 1394791 to 1603020. NID: g2633699. 
atgacttactgctcattatctataaagatttatacattgaaaatattaattgttttaaca 
ataggaggtcatgttgttattatgagtcagtttaaggacacattatataaactatttgag 

15 ccaatgatgaaaatagagttctatcaaaatcttttggttaatcttttaattatacttgct 
tatatcttgatgggtatgattgtaattgcgatatcaagaaagttagttactaaatttttc 
aacgttaatgaaaagaaaaagaaccgtcataaaattaagagaagtgaaacactatccaca 
ttgattcaaaat ttaataagttatgtcgtatggtttat tgtcct tacgtcaatactttca 
cgtttcggtattagtgtatcagcaattttagcaggagctggagttgttggtgttgccgtt 

20 ggtttcggagcacaaacaattgtaaaagacattattactggtttctttatcatatttgaa 
ggacagtttgatgtgagtgattatgttcaaattaatgcatctggggtaacaattgctgaa 
ggtacggttaaaacgattggtttaagatcaacgcgtatacaatcagatactggagaaatt 
tatacattacctaatggtatgattagtgaaatagttaattattctgctacagatgtttca 
cctattgtgatgataccgatttctccaaatgagaattataaagtgatagaagagaaatta 

25 ttaacatttttacctacattaaagaataaatatgacatatttgtatccgcaccagattta 
cttggtttagatagtgttgatggcaatgaaatggtgattaaacttttagcacatgttaag 
cctggaatgcattttccaggacaacgtttacttcgtaaagaggtcatacaatactttagt 
gaagaaggcattcatattccgaaaccaacacttgtaaaacttgataaagaattgaataaa 
aaagaatag 

30 

Sequence 984 

MTYCSLSIKIYTLKILIVLTIGGHVVIMSQFKDTLYKLFEPMMKIEFYQNLLVNLLIILA 
YILMGMIVIAISRKLVTKFFNVNEKKKNRHKIKRSETLSTLIQNLISYVVWFIVLTSILS 
RFGISVSAILAGAGVVGVAVGFGAQTIVKDIITGFFIIFEGQFDVSDYVQINASGVTIAE 
35 GTVKTIGLRSTRIQSDTGEI YTLPNGMISEIVNYSATDVSPIVMIPISPNENYKVIEEKL 
LTFLPTLKNKYDIFVSAPDLLGLDSVDGNEMVIKLLAHVKPGMHFPGQRLLRKEVIQYFS 
EEGIHIPKPTLVKLDKELNKKE* 

Sequence 985 
40 Contig_054 0_pos_685_182 f 

is similar to (with p-value 4.0e-46) 

>sp:sp|P54 4 52|YQEG_BACSU HYPOTHETICAL 20.1 KD PROTEIN IN NUC 
B-AROD INTERGENIC REGION. >gp : gp| D84 4 32 | BACJH642_91 Bacillus 
subtilis DNA, 283 Kb region containing skin element. NID: g 
45 2627063. >gp : gp I Z99117 | BSUB0014_48 Bacillus subtilis complet 
e genome (section 14 of 21): from 2599451 to 2812870. NID: g 
2634966. 

atgccaaatgca'tatgtgaaatcaatatttgaaattgatatagaaaaacttgccgatagt 
ggtgttaaaggtatcataactgatttagataatacacttgttggttgggatgttaaagaa 

50 cctactaagggtgttaaatcatggtttgctaaggctaaagatttaggaataactgtcaca 
attgtgtcaaataataataaaagtcgagtatcaagtttctcaagtaatttaggtgtagat 
tatatattcaaagcacgtaaaccgatggggaaagcctttaagatggctattaaaaaaatg 
aaaattcaaccgagagaaaccgttgttgtaggagatcaaatgct tact gat gtgtttggt 
ggcaattgtaatggtttatatacaattatggtagtacctgttaaacggactgatggatta 

55 attacaaagtttaatcgattaattgaaagacgattattaaatcattttagaaaaaaaggt 
tatattaaatgggaggaaaattga 

Sequence 986 

MPNAYVKSI FEIDIEKLADSGVKGIITDLDNTLVGWDVKEPTKGVKSWFAKAKDLGITVT 
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IVSNNNKSRVSSFSSNLGVDYIFKARKPMGKAFKMAIKKMKIQPRETVVVGDQMLTDVFG 
GNCNGLYTIMVVPVKRTDGLITKFNRLIERRLLNHFRKKGYIKWEEN* 

Sequence 987 
5 Contig_0541_pos_1165_14 85, 

putative peptide of unknown function 

atgacaggaagaacgaataatacgaataaccacgcccaagttgaactggcagtacgtcga 
tcacgttcaaggaaaatgatgacaaatgccaatactacgttagttataaacccaactact 
aacaatatagttaaaatagtacccaaaaatccaaaattcattactaaatactcctattca 
10 gtctttcaccaagaaaatttagccgccgatttactgaactatgcttccgttaccaatgtt 
aaaatagcattaaagtcctcgtcattatgcggaaatgcacgttcttcgatttgtttaata 
gtaatgtttaataatttgtaa 

Sequence 988 

15 MTGRTNNTNNHAQVELAVRRSRSRKMMTNANTTLVINPTTNNIVKIVPKNPKFITKYSYS 
VFHQENLAADIXNYASVTNVKIALKSSSLCGNARSSICLIVMFNNL* 

Sequence 989 

Contig_0541_pos_307 6_3672, 

20 putative peptide of unknown function 

atgtggaggtatgcgcattggaagaattggttaaaattgactgtttcatcagtctatatt 
attagtttagttttaacacttttatttcaagttagtctattaaatgagaataaaacaaat 
caaatagaacatgcatcaactatgaaagaaaagtctaatataaataatgtaaaaacaact 
aaaaataaaaatatggaaaaatcaacgcagacagacaaacaaaactctgtgaacttaaag 

25 caaaacacaaaagatcaaaataataacgcaaatgatgaagcagcttctccaactagcgaa 
caaaatgcagctatagcacaagcaaagtcatatgcaaatacattacctatctctaagaaa 
agtttatacaaacaattaacttcggaatacggagagaaatatccggcagacatagcacag 
tatgctgttgaccatatcagtgtagattataaaatgaatgcactgagattagcaaaaagt 
tacgtaaaaaatataaacatt tctaatcaagcgttatatgatcaactcgtttcagaaaat 

30 ggagaaggatttactcctgaagaagcacaatatgcaatgaatcatttagataggtaa 

Sequence 990 

MWRYAHWKNWLKIiTVSSVYI ISLVLTLLFQVSLLNENKTNQIEHASTMKEKSNINNVKTT 
KNKNMEKSTQTDKQNSVNLKQNTKDQNNNANDEAASPTSEQNAAIAQAKSYANTLPISKK 
35 SLYKQLTSEYGEKYPADIAQYAVDHISVDYKMNALRLAKSYVKNINISNQALYDQLVSEN 
GEGFTPEEAQYAMNHLDR* 

Sequence 991 

Contig_0541_pos_10591_9902, 

40 is similar to (with p-value 4.0e-42) 

>sp:sp|P54 471|YQFN_BACSU HYPOTHETICAL 23.7 KD PROTEIN IN CCC 
A-SODA INTERGENIC REGION. >gp: gp | D84 4 32 | BACJH642_139 Bacillu 
s subtilis DNA, 283 Kb region containing skin element. NID: 
g2627063. >gp : gp | Z99116 | BSUB0013_228 Bacillus subtilis compl 

45 ete genome (section 13 of 21): from 2395261 to 2613730. NID: 
g2634723. 

atgattaaccttaaccaaagattatcaattgtatgctcatttattaaaagaggaacattg 
gctgatattggctcagaccacgcatatctacctatatatgcaattcaaaacgacttatgc 
acaaaagcaatagcgggagaagtgattcaaggaccttataaggctgctaaaagaaatatt 

50 gcaaattatgaattaaatcaacaggttgatgtacgtctaggcgatggtctaagcgttata 
aacacagaagaccaaattgataatataactgtttgtggtatgggagggccattaattgca 
aaaatattaaacgatggaaaagataaattagttaaccatccaagactcatactacaaagc 
aacatacaaactcaagcattaagacaaactcttaataaactttcatatgaaatcgttgat 
gaaagaatcattgaggaaaagggtcacatatatgaaatcgtggtagctgagtttaataat 

55 aacttagttaaattaaatatattacaagaaaaattcggaccat ttttacttagagaatgt 
aataacatttttcaaaaaaaatggcaaagagagttagaagcactgcgtgatataaaatcc 
caattgaattcaacatcacatcatgagagactaaaagaaatagaagatgaaattaactta 
atacaagaggtgttaattaatgaaaattag 
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Sequence 992 

MINLNQRLSIVCSFIKRGTLADIGSDHAYLPIYAIQNDLCTKAIAGEVIQGPYKAAKRNI 
AN YELNQQVDVRLGDGLSVINTEDQI DNI TVCGMGGPLI AKILNDGKDKLVNH PRL I LQS 
NIQTQALRQTLNKLSYEIVDERIIEEKGHIYEIVVAEFNNNLVKLNILQEKFGPFLLREC 
5 NNIFQKKWQRELEALRDIKSQLNSTSHHERLKEIEDEINLIQEVLINEN* 

Sequence 993 

Cont i g_0 5 4 1 j?os_9 5 4 7_8 8 1 3 , 

is similar to (with p-value 7.0e-32) 

10 >sp: sp | P534 34 | YRP2_LISMO HYPOTHETICAL 41.4 KD PROTEIN IN RPO 
D 3 1 REGION (ORFA2 } . >gp : gp| U17284 | LMU17284_3 Listeria monocy 
togenes major sigma factor (rpoD) gene, partial cds, and dow 
nstream orfAl and orfA2 genes, complete cds. NID: g687597. 
atgattaatacaaatagctcatattattacaaagttcaaacttttatacctaaaaattat 

15 attgaagatttcaaagacagtttaaacgaacttggattagctaaagaaggtaattacgaa 
tattgtttctttgaaagtgaaggtaaagggcaatttaaaccagtaggtgatgcaagtcct 
tatatagggaagttagatagtatcgaatatgttgatgaaataaaacttgagtttatgata 
aaagacaatgaattagaaataactaaacgtgctattttagataatcacccatacgaaaca 
ccagtttttgattttattaaaatgaacaaagaaagtgagtatggattagggattattgga 

20 caattaaaccaaactatgactttagatgaattttctgaatatgccaaaaaacagctcaat 
ataccgagcgtacgatatacaggtcaacatgatagtccaattaagaaagtagctatcata 
ggtggttcaggtataggatttgagtataaagctagccaacttggagcagatgtttttgtt 
actggtgatattaaacaccatgatgctttagatgctaaaatccaaaatgtaaatttatta 
gacatcaatcattatagtgagtatgttatgaaagaaggattaaaagaattattagaaaaa 

25 tggttatttaaatatgaaaatcaatttccaatatatgcttctgaaatcaacacagatcca 
tttaaatataaataa 

Sequence 994 

MINTNSSYYYKVQTFIPKNYIEDFKDSLNELGLAKEGNYEYCFFESEGKGQFKPVG DASP 
30 YIGKLDSIEYVDEIKLEFMIKDNELEITKRAILDNHPYETPVFDFIKMNKESEYGLGIIG 
QLNQTMTLDEFSEYAKKQLNIPSVRYTGQHDSPIKKVAIIGGSGIGFEYKASQLGADVFV 
TGDIKHHDAL DAKIQNVNLLDINH YSEYVMKEGLKELLEKWLFKYENQFPI YASEINTDP 
FKYK* 

35 Sequence 995 

Contig_0541_pos_8778_7432, 

putative peptide of unknown function 

atgtcaaaacatccatttgaacactttaatttagatgagaatttaattgaagctgttaaa 
aatctcaattttgaaaaaccgactgaaatccaaaatagaatcataccgagaattcttaaa 

40 ggaacaaatttaataggacaatctcaaactggaactggaaagtcacacgcttttctttta 
ccattaattcaacttatagaaagtgacattcaagagccacaagccatcgtagtagctcca 
acacgtgaacttgctcagcaactatatcaagttgctatgcatttagtaaaattcaaaaaa 
ggtataaatgtaaaacttttcattggtggtaccgatttagaaaaagataaacaacgatgt 
agccatcaaccacaactcattattggtacaccaacaagaattaatgatttagcacattca 

45 ggttatcttcatgcacatttagcgtcatatttaattatagatgaagctgatttaatgatt 
gacctcggtctcattgaagatgttgaccatattgcagcgagattagatgatgaaaatgct 
catctagcggtatttagtgcaacaatacctaaatcattacaaccatttttaaataaatat 
ttaagtcaaccagaatttgtagaagttgatggcaaagctcataataaagaaaatatcgaa 
ttttatctaattcctacaaaaggt tctgctaaggtagataaaacattggaattgatagat 

50 atattgaatccttatctatgtattattttctgtaacagtcgtgaaaatgccgatgaattg 
gcagacactttaaataaagaaggaattaaaataggtatgattcatggtggtttaacacca 
agagaacgtaaacaacaaatgaaaagaataagaaatttagattttcaatttgtcattgca 
agcgatcttgcttctagaggaatagatattgaaggcgtaagtcatgttattaatttcgat 
gtacccaatgatatcgatttcttcacacatcgcgtaggtcgaacaggaagaggtaat tat 

55 aaaggtgtagccattacattatatagtcctgatgaagaaagtaatattactcttattgaa 
gacagaggttataaatttgaaaatgtagatattaagaatggtgaattaaaaccgataaag 
gcatacaatatgcgtaaatcaagacagcgcaaagatgaccatttaacaaatgaagttaaa 
cacaaagtaagaagtaaatcaaaacgtaaagttaaaccaggctataaaaagaagtttaaa 
caagaagttgaaaaaatgaaacgtcaagaaagaaagcagtatagtaaaaagcaaaataga 



246 



WO 01/34809 



PCTYUS00/30782 



caaaaacgaaaaaataataaaggatag 
Sequence 996 

MSKHPFEHFNLDENLIEAVKNLNFEKPTEIQNRIIPRILKGTNLIGQSQTGTGKSHAFLL 
5 PLIQLIESDIQEPQAIWAPTRELAQQLYQVAMHLVKFKKGINVKLFIGGTDLEKDKQRC 
SHQPQLIIGTPTRINDLAHSGYLHAHLASYLIIDEADLMIDLGLIEDVDHIAARLDDENA 
HLAVFSATIPKSLQPFLNKYLSQPEFVEVDGKAHNKENIEFYLIPTKGSAKVDKTLELID 
ILNPYLCIIFCNSRENADELADTLNKEGIKIGMIHGGLTPRERKQQMKRIRNLDFQFVIA 
SDLASRGIDIEGVSHVINFDVPNDIDFFTHRVGRTGRGNYKGVAITLYSPDEESNITLIE 
10 DRGYKFENVDIKNGELKPIKAYNMRKSRQRKDDHLTNEVKHKVRSKSKRKVKPGYKKKFK 
QEVEKMKRQERKQYSKKQNRQKRKNNKG* 

Sequence 997 

Con t ig_0 5 4 l_pos_5 3 60_4 650, 
15 is similar to (with p-value 1.0e-20) 

>gp: gp| Z71552 | SPADCA_3 Streptococcus pneumoniae adcRCBA oper 
on. NID: g3758891. 

gtgacattaggtggtatttcctttggtatgtttttgcttaccattattcccgttttctca 
gtaataaaccctatgtggtttggtattctttttgctgttattggagcgttattaattgaa 

20 aaattaaggacttcgttttctaattatcaagaaattgcaattcctattataatgagcgct 
ggtattgctctaagtgctatttttatttctctagcagatggttttaatcaagaaatcgta 
gacctactatttggatcaattagtgcagtaaatattagtgatttaactacaattattatc 
attacaataattgttctcatatttattgttttattttataaagaattgtttattttatca 
tttgacgaagaatatagtaaggtcataggtataccaaagtggat tcaatttttatttata 

25 gtaattgttgctatggtaatatctgcatcaatgagagttgtaggtatattattagtaagc 
gcgttaataactcttcctatagcaatttcaatgagaataactaaaggatttaaacaatta 
atagcattaagtgttatattaggagaattatctgtaattctaggattaattatagctttt 
tatatgaatatatcacctggtggcgtcattgttgtactattggtattaatgctcatacta 
acgatgattattcagaagttaaaaattaagtttaaaaagggagtcgtttaa 

30 

Sequence 998 

VTLGGISFGMFLLTI IPVFSVINPMWFGILFAVIGALLIEKLRTSFSNYQEIAIPIIMSA 
G I ALSAI FI SLADGFNQEIVDLLFGSISAVNISDLTTI I I ITU VLI FI VLFYKELFI LS 
FDEEYSKVIGI PECWIQFLFIVIVAMVISASMRVVGILLVSALITLPIAISMRITKGFKQL 
35 IALSVILGELSVILGLIIAFYMNISPGGVIVVLLVLMLILTMI IQKLKIKFKKGVV* 

Sequence 999 
Contig_0541_pos_0_1325, 
is similar to (with p-value 1.0e-80) 
40 >gp : gp I U88888 I BFU88888_2 Bacillus firmus MecA homolog (mecA) 
and cardiolipin synthase (els) genes, complete cds . NID: g2 
952026. 

atgaattttggatttttgggtactattttaactatattgttagtagttgggtttataact 
aacgtagtattggcatttgtcatcattttccttgaacgtgatcgacgtactgccagttca 

45 acttgggcgtggttattcgtattattcgttcttcctgtcattggatttattttgtatcta 
tttttaggacgaacggtttccaagaaaaagatggaaaaaaataacggtgatgaattacat 
gcatttgaagatttagttcaagaccaaatcgacagttttgataaacataattatggttat 
atcaatgatcaagtcattaaacaccgtgatttaatacgtatgttgttaatgaaacaagat 
gcctttttaacagaaaataataaaatcgatttatttacagatggtcataagctttatgaa 

50 aaagtacttgaggatatttacaatgctcaagactatatacatctagagtactataccttt 
gaacttgatggattaggtaaaagaatcttagatgcacttgaaactaaacttaaagaaggt 
ttagaagttaaacttttgtatgacgatgttggttctaaaaaggttagattatcaaaattt 
aaacatttcagagcattaggtggagaagttgaagcatttttcccttcgaaagtaccttta 
atcaatttcagaatgaataatcgaaatcatagaaagattatcattatagatggacaaatt 

55 ggttacgttggcggttttaatgtcggcgatgattatttaggattaggtaagttaggttac 
tggagagat a ca cat acacgtgttcaaggtgaatgcatcgatgcact a caattaagattt 
attttagactggaattcacagtcgcatcgtccacaatttaaatt tgatcaaaaatatttc 
cctaaaaaaaatggggacaaaggaaacgcggctattcaaatcgcttctagtggacctgca 
tttgatttacatcaaatagaatatggttatacaaaaatgataatgagcgctaaaaagtct 
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atctatctacaaagcccttactttattccagaccaatcatacattaatgcattaaaaatg 
gctgctaatagcggcgttgaagtaaaccttatgataccgtgtaaacctgatcatccattc 
gtttattgggctacattttcaaatgcagctgatttattggatagcggagttaatatttac 
acttatcaaaatggatttattcattctaaaatattaatgattgatgatgaaatttcttca 
5 attggtagtgcaaacatggactttagaagctttgaactgaatttcgaagtgaatgcattt 
atata 

Sequence 1000 

MNFGFLGTILTILLVVGFITNVVLAFVIIFLERDRRTASSTWAWLFVLFVLPVIGFILYL 
1 0 FLGRTVSKKKMEKNNGDELHAFEDLVQDQI DS FDKHNYGY I NDQV I KHRDLI RMLLMKQ D 
AFLTENNKIDLFT DGHKLYEKVLEDI YNAQDYIHLEYYTFELDGLGKRILDALETKLKEG 
LEVKLLYDDVGSKKVRLSKFKHFRALGGEVEAFFPSKVPLINFRMNNRNHRKIIIIDGQI 
GYVGGFNVGDDYLGLGKLGYWRDTHTRVQGECIDALQLRFILDWNSQSHRPQFKFDQKYF 
PKKNGDKGNAAIQI ASSG PAFDLHQI EYG YTKMIMSAKKS IYLQS PYFIPDQSYI NALKM 
15 AANSGVEVNLMIPCKPDHPFVYWATFSNAADLLDSGVNIYTYQNGFIHSKILMIDDEISS 
I GSANMDFRSFELN FEVNAFIX 

Sequence 1001 

Contig_0542_pos_1002_167 6, 

20 putative peptide of unknown function 

atgttgattatttttactgctttaatgattattgctaatttttactatatattttttgaa 
aaaattggctttttactagtactcctattaggatgtgtacttgtatatgtagggtatgtg 
tattttcataaagtaagaggactactatctttttggataggaaccttattaattgctttt 
acacttttgtctaataagtacacgataattattctatttatatttttaatagtagtcatc 

25 atacgttatttggtttataagtttagacctttaaaagtgattgctacagatgaagaaatc 
acatcacccatttttattaagcaaaaatggtttggtgaacaacatacaccagtgtatgta 
tataaatgggaagacgtacagattcaacacggtataggagacatacacattgatatgaca 
aaagcggcaaatattaaggaaacaaataccatagttgtgcgtcatattttaggtaaagta 
• caagtagttgtacctcttaattataatataaatttacatgcgactctcttctacggcact 

30 gcttatgtgaacgataaatcttataagattgagaataaccatgttcaaattgaagaaaaa 
acgaaagatgataattatactgttaatgtttacgtttcatcattcattggagacgtagag 
gtgatttacagatga 

Sequence 1002 

35 MLI I FTALMIIANFYYI FFEKIGFLLVLLLGCVLVYVGYVYFHKVRGLLSFWIGTLLIAF 
TLLSNKYTIIILFIFLI VVIIRYLVYKFRPLKVIATDEEITSPIFIKQKWFGEQHTPVYV 
YKWEDVQIQHGIGDIHIDMTKAANIKETNTIVVRHILGKVQVWPLNYNINLHATLFYGT 
AYVNDKSYKIENNHVQIEEKTKDDNYTVNVYVSSFIGDVEVIYR* 

40 Sequence 1003 

Contig_0542_pos_1703_2719, 

is similar to (with p-value 1.0e-42) 

>gp : gp | U81487 | LLU81487_1 Lactococcus lactis subsp. cremoris 
MG1363 histidine kinase (llkinD) gene, complete cds. NID: g2 
45 182993. 

atgcttatattagtatatagtatgcttattgcttttttatttattgataaagtgtttgta 
aatattatcttttttcaggggatgttttatacacaaatatttggaatacctgtttttcta 
tttttaaatttattaattgttcttttatgtattatagttggatctgttttagcttataaa 
attaatcaacaaaatgattggattatttcacaaatagaaagatcaatagaaggacaaaca 

50 gtaggtatcaatgatcaaaatatcgaattatatacagaaacgatagatatttatcataca 
ctagttccattaaatcaagaattacatcgacttagaatgaagactcaaaatttaactaat 
gaaaactacaatattaatgatgtaaaagtcaaaaagattatcgaagatgagcgacaacga 
cttgccagggaattacatgattctgttagtcaacaattatttgctgcgagcatgatgcta 
tcggcgataaaagagtcgaaattagaaccacctttaaatcaacagataccaattcttgaa 

55 aaaatggttcaagactcacaacttgaaatgagagctttgttattacatttaagaccgata 
ggtttaaaagataagtctttaggtgaaggaattaaagatttagtcatcgatttacaaaag 
aaagtaccaatgaaagttgtgcatgaaattcaagattttgaagtgccaaaaggcattgaa 
gatcacttgttcagaattacacaagaagctatttcaaatacattgagacattcaaatggt 
acaaaagtaactgtggaattatttaatcaagaggat tatcttttactaagaattcaagat 



248 



WO 01/34809 



PCTYUSOO/30782 



aatggaaaagggtttaatgtagatgaaaaatttgaacaaagttatggtttgaaaaatatg 
cgagaacgagcgttagaaattggtgcgacgtttcatattgtatctttacctgattcaggt 
acacgaattgaagttaaggcaccattgaataaggaggagaattcaagtggcgattaa 

5 Sequence 1004 

MLILVYSMLIAFLFIDKVFVNI I FFQGMFYTQIFGIPVFLFLNLLIVLLCIIVGSVLAYK 
INQQNDWIISQIERSIEGQTVGINDQNIELYTETIDIYHTLVPLNQELHRLRMKTQNLTN 
ENYNINDVKVKKIIEDERQRLARELHDSVSQQLFAASMMLSAIKESKLEPPLNQQI PILE 
KMVQDSQLEMRALLLHLRPIGLKDKSLGEGIKDLVIDLQKKVPMKVVHEIQDFEVPKGIE 
10 DHLFRITQEAISNTLRHSNGTKVTVELFNQEDYLLLRIQDNGKGFNVDEKFEQSYGLKNM 
RERALEIGATFHIVSLPDSGTRIEVKAPLNKEENSSGD* 

Sequence 1005 

Contig_054 2_pos_2868_3338, 

15 is similar to (with p-value 3.0e-24) 

>sp: sp| P55184 | YXJL_BACSU HYPOTHETICAL TRANSCRIPTIONAL REGULA 
TOR IN GALE-PEPT INTERGENIC REGION . >gp : gp | X99339 I BSGALE_5 B 
.subtilis orfs 1,2,3,4, pepT and galE genes. NID: gl429253. 
atggatttacttatggacgatatggatggtgtagaagcaactactgaaataaaaaaagat 

20 ttacctcaaattaaagtagtcatgttaacaagctttatagaggataaagaagtttatcgt 
gcacttgattctggagtagatagttatattttaaagacaacaagtgcaagtgatatagct 
gacgctgtgcgtaaaacgtatgaaggtgaatcagtatttgaaccagaagtgttagtaaaa 
atgcgcaatcgtatgaaaaaacgtgccgagctttatgaaatgttgacagaaagagaaatg 
gagatcctattacttatagctaaaggatactctaaccaagagattgcaagcgcctctcat 

25 atcaccatcaaaacagtaaaaactcatgtaagtaacatactaagtaaattagaagtacaa 
gatcgaacacaagcagtaatatatgcgttccagcataatttaattcaataa 

Sequence 1006 

MDLLMDDMDGVEATTEIKKDLPQIKVVMLTSFIEDKEVYRALDSGVDSYILKTTSASDIA 
30 DAVRKTYEGESVFEPEVLVKMRNRMKKRAELYEMLTEREMEILLLIAKGYSNQEIASASH 
ITIKTVKTHVSNILSKLEVQDRTQAVIYAFQHNLIQ* 

Sequence 1007 

Cont ig_054 2_pos_4 5 7 5_33 97 , 
35 is similar to (with p-value 1.0e-18) 

>gp:gp|AF071085 |AF071085_2 Enterococcus faecalis strain OG1R 
F polysaccharide biosynthetic gene cluster, partial sequence 
. NID: g3608387. 

atgtcaaaaaaagagaaaacaacttctaaatatcttaattcaatagaagataaagagcat 

40 aaaaagaataaaaaaatagaagttgaccgtacatatatagaacctcaagaattccaatct 
aagaaacctaaaaaaaagaatcaagtattttttgtttcccggctgaataaaccagcaaaa 
tacaccgaaaactctaatttcttttcttacctgatttataggataggtaaagatgacgct 
gcaggtttagcagcacagatgacatatcattttgtattagcacttttcccaatgctaatt 
tttttacttacgctacttggtcaatttatcacgattgatgctaatcagattaatcaaaaa 

45 gtaagtcaatatgtccctgatcaagaaacagctagcatcgttggtggaattgttaaagat 
atctctgacactgccagtggaggtattttgtcagttgggttaattttagctatttggtca 
gcatcaaatggaatgtccgctattattaactcatttaatgttgcttatgacgttgaggat 
tctcgaaacggtgtagtagttaaattattaagtattctatatacacttgttttaggtgca 
gtatttgttgttgctgtagtacttataacactaggtccagtcattaataaatttttattt 

50 ggaccactaggtattgataatcaaattgaatggatttttaatttagtacgaattgttatt 
ccattgattattattttcatcatatttactgtactttattcagttgcacctaatgttaaa 
acaaaattacgttctgttattcctggcgctattttcacttccattatttggttactaggg 
tcctttgcattcggttactatatttcaaactttagtaactattcgaaaacatacggaagt 
ttagctggtattatcattttattcttatggttgtatatcacaagctttattattattatc 

55 ggtgctgaaatcaatgcaattattcaccaaagaaaagtcatagctggtcacacgcctgaa 
gaagccgctattaaacatgatgataacaatgaaaatcactataacgaaaatacgacttat 
gaatactatgaagatagcaaagatgtagatatctctaatgaagatgacacgtataatatc 
aatcatcaatctaaagaagaacatcacacaagcgactga 
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Sequence 1008 

MSKKEKTTSKYLNSIEDKEHKKNKKIEVDRTYIEPQEFQSKKPKKKNQVFFVSRLNKPAK 
YTENSNFFSYLIYRIGKDDAAGLAAQMTYHFVLALFPMLIFLLTLLGQFITIDANQINQK 
VSQYVPDQETASIVGGIVKDISDTASGGILSVGLILAIWSASNGMSAIINSFNVAYDVED 
5 S RNGV VVKLLS I LYTLVLGAV FVVAWLI TLG PVINKFLFG PLG I DNQI EW I FNLVRI V I 
PLII IFI IFTVLYSVAPNVKTKLRSVIPGAIFTSIIWLLGSFAFGYYISNFSNYSKTYGS 
LAGI IILFLWLYITSFII IIGAEINAIIHQRKVIAGHTPEEAAIKHDDNNENHYNENTTY 
EYYEDSKDVDISNEDDTYNINHQSKEEHHTSD* 

10 Sequence 1009 

Contig_054 3_pos_3868J3380, 

putative peptide of unknown function 

atgccaaaagtacatcaagttaaggaaagatttgtgaaattaggggaccaacagtttaaa 
gcatttgaaattagatacgatacatacattcattacgtgttgatgtgtgatggtgtagat 

15 ttagcaatgaaacagcgcgtggaagattttgtcagtgcgcaaacatggcatcaacaattt 
aaaacgattggcgtcatgctttttcaacaagataaacaattcatatatccactgatacat 
atacctaaaatagatagcttaatctgggaaaatagctgtggttcaggagcggcttctatc 
ggtgtgttagttaattatctaacagatcatgatattcaagattacctagttaaccaaccc 
ggaggcagtattattgtctcatccagaaagtctggacaaaatgaataccaaacaacgatt 

20 aagtgtcaagtttcaactgtcgcaacaggacaagcatatatagaacaggagacaatgacg 
caaatatga 

Sequence 1010 

MPKVHQVKERFVKLGDQQFKAFEIRYDTYIHYVLMCDGVDLAMKQRVEDFVSAQTWHQQF 
25 KTIGVMLFQQDKQFIYPLIHI PKIDSLIWENSCGSGAASIGVLVNYLTDHDIQDYLVNQP 
GG S 1 1 V S S RKSGQN E YQT T I KCQVS TVATGQA Y I EQETMTQ I * 

Sequence 1011 
Con t i g_0 5 4 3_po s_2 5 6 0_1 2 8 3 , 
putative peptide of unknown function 

atggttggtagtggaccggtcgctattcaacttgctcgactatgtcatttacatggagaa 
catatagttgatatggtgagtcgcgttcatgcatcaaccaaatctaagagagtctttgat 
gcttatcaacgtgacggctttttttcagtaatgactcaaaatgatgcacatcagtgtttt 
tcaggtaagtttacggttagacatttttttaaagatgttaaagatattactgaatattat 
gacgtggtgattttagcatgtactgccgatgcgtatcgaccgatattacagcaattatct 
aagtccacattaaagcgtattaagcaaatcatcttggtctcaccaacattaggatcacat 
atgcttgttaagcaattactatcagatgt tcaatgtgaaggtgaagtgatt teat tt tec 
acttatctaggcgatacccgaatatttgataaagcacaaccacattgtgtcctaaccaca 
cgagt t a aa t caaaa t t at tcgtaggttcgactca a tctcagtct a tgacgttgtgtaag 
cttaagtctttatttgactatttgaatatagaattaacaacgatggacacaccactacat 
gcggagatacataatagttcactttatgtacacccaccattgtttatgaatcaattttca 
ttaaaggcggtatttgaagggacgaaagtaccagtatatgtatataagctatttccagag 
ggtccaatcacaatgaccttaatacacgaaatgcgattaatgtggcaagaaatgatgatg 
atattaaaaaaattaaaggtaccttcggtcaatcttctaaagtttatggtgaaagaaaac 
taccctatacgttatgagaccatgcgcgaagtagatattgaaaactttaaaaatttacca 
gctattcatcaagagtatctactttatgtgcgatatacagcaattttaatcgatccgttt 
tctaatccggacgatcaaggtgcatattttgatttttctgccgtaccatacaaacatgtt 
gatactgatgaacaaggagtcatacatataccacgcatgccgagtgaagattattatcgt 
actttgataattcaagcgattggaagagcattaaacgttgcaacaccgatgattgacaca 
ttgttattacgttatgaaaatactgttaaacaatactgtgacacacatttacatcaacaa 
ctatcaaggcaattcgaattacatcattttaaacaggatttagcgttagtgacgaactac 
ttaactttttataaataa 

Sequence 1012 

55 MVGSGPVAIQLARLCHLHGEHIVDMVSRVHASTKSKRVFDAYQRDGFFSVMTQNDAHQCF 
SGKFTVRHFFKDVKDITEYYDVVILACTADAYRPILQQLSKSTljKRI KQI I LVSPTLGSH 
MLVKQLLSDVQCEGEVISFSTYLGDTRIFDKAQPHCVLTTRVKSKLFVGSTQSQSMTLCK 
LKSLFDYLNIELTTMDTPLHAEIHNSSLYVHPPLFMNQFSLKAVFEGTKVPVYVYKLFPE 
GPITMTLIHEMRLMWQEMMMILKKLKVPSVNLLKFMVKENYPIRYETMREVDIENFKNLP 
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AIHQEYLLYVRYTAILIDPFSNPDDQGAYFDFSAVPYKHVDTDEQGVIHIPRMPSEDYYR 
TLIIQAIGRALNVATPMIDTLLLRYENTVKQYCDTHLHQQLSRQFELHHFKQDLALVTNY 
LTFYK* 

5 Sequence 1013 

Contig_054 3_pos_0_1232, 

is similar to (with p-value 0.0e+00) 

>gp:gp| AF076683I AF076683_1 Staphylococcus aureus oligopeptid 
e transporter putative substrate binding domain (opp-lA) , ol 

10 igopeptide transporter putative membrane permease domain {op 
p-lB), oligopeptide transporter putative membrane permease d 
omain (opp-lC) , oligopeptide transporter putative ATPase dom 
ain (opp-lD) , and oligopeptide transporter putative ATPase d 
omain (opp-lF) genes, complete cds; and unknown gene. NID: g 

15 3800817. 

atgaataaactcacaaaactaagtacagtcatttttgtatctggaattattttagccggt 
tgtggaaataacaaagaactaacagagaaaaaagagaataaagtattatcatatacaact 
gtcaaagatattggagatatgaatccccatgtttatggaggttcaatgtcagcagaaagt 
atgatttatgagccgttagttcgcaataccaaggatggtattaagccattattagcaaaa 

20 aaatgggacatttcacctgatggtaagacatatacgtttcatt taagggatgatgtatct 
tttcatgatggtacgaaatttgatgcagatgcagtgaagaaaaacatcgatgcagtacaa 
caaaa t aagaaact aca t t ca t ggt t aagact ttcaacactgattgat gat gtcaaagtt 
aaggataagtatacgatacaactacatttgaaggaagcttatcaacctgcgttagcagaa 
ctagctatgccacgaccatacgtttttgtatcgcctaaagattttaaacacggcacaacc 

25 aaagatggtgtgaaatcatttgacggtacaggaccatttaaaatgggtgaacacaaaaaa 
gatatatctgcagagtttaataaaaataatcaatattggggagaaaaggcaaaattaaat 
aaagtagaagcaaaagttaaacctgcaggagaaacaacatttttatcaatgaaaaaagga 
gaaaccaactttgcttatacagatgatagaggtacagacagcttagataaagatagttta 
aaacaattaaaagaaaccggaagctaccaagtaaaacgtagccaggctatgaatacaaaa 

30 atgcttgttgttaattctggtaagaaagatagtgcagtcagtgataaagcagtcagacaa 
gcattaggtcacatggtaaatagagataaaatagctcaagatattttagacaaacaagaa 
aagccagccacacaactatttgctaaaaatgtgacagatataaactttaatttaccaaca 
agaacatatgataagaaaaaagcgcaagcgttattagacaaggctggatgggtgctttca 
aaagatcgacaagttcgtcaaaaagagggcaaagatttgaatcttaagttgtattatgac 

35 aaagggtcttccagtcaaaaagaacaagctgaattcttagaagcagaatttaagaagtta 
ggtgtacaactagatataaacggagaaacgtT 

Sequence 1014 

MNKLTKLSTVIFVSGIILAGCGNNKELTEECKENKVLSYTTVKDIGDMNPHVYGGSMSAES 
40 MI YEPLVRNTKDGIKPLLAKKWDISPDGKTYTFHLRDDVSFHDGTKFDADAVKBCNIDAVQ 
QNKKLHSWLRLSTLIDDVECVKDKYTIQLHLKEAYQPALAELAMPRPYVFVSPKDFKHGTT 
KDGVKSFDGTGPFKMGEHKKDISAEFNKNNQYWGEKAKLNKVEAKVKPAGETTFLSMKKG 
ETNFAYTDDRGTDSLDKDSLKQLKETGSYQVKRSQAMNTKMLVVNSGKKDSAVSDKAVRQ 
ALGHMVNRDKIAQDILDKQEKPATQLFAKNVTDINFNLPTRTYDKKKAQALLDKAGWVLS 
45 KDRQVRQKEGKDLNLKLYYDKGSSSQKEQAEFLEAEFKKLGVQLDINGETX 

Sequence 1015 

Contig_054 5_pOs_1330_1851 / 

putative peptide of unknown function 

50 atgtcaaaaatcttaaacacacaattaactggtatttttaatcggcttgaaaaacaagag 
ttggatattcaaatggcagctcaatgtctcattcaagcaattggtggagaaggacatgtc 
tatatcaaaggctacgatgatttaaaattctatgagtcattcatattacaaagccatgaa 
aaattagcgtctagcttaccacttgaagatttacaaaattttaacgatatagatacaaca 
gatagggtactgttattttcaccatactacacttcggaagttgaaagtgatgtacttcaa 

55 cttattgatttagatgtcgatttagtgcttatttgtaataaccctaaacgagatgatttt 
cctaatcatttaattcattatgttaatttatcaacacctaggcccattgtttacacagaa 
gattatgataaaatcattcaaccacatccgatggccttaaattatatttattatgatatt 
tatactcaaatgattgagatgactagagacctagatttatag 
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Sequence 1016 

MSKILNTQLTGIFNRLEKQELDIQMAAQCLIQAIGGEGHVYIKGYDDLKFYESFILQSHE 
KLASSLPLEDLQNFNDIDTTDRVLLFSPYYTSEVESDVLQLIDLDVDLVLICNNPKRDDF 
PNHLIHYVNLSTPRPIVYTEDYDKIIQPHPMALNYIYYDIYTQMIEMTRDLDL* 

5 

Sequence 1017 

Contig_054 5_pos_6198_6629, 

putative peptide of unknown function 

atgatacaaggtttaggctatttattgtccaatataacagattataaagaattaacgaat 
10 ttagctcaaaatggagatcgtgatgccattgatttaaaagtaaaacatatttataaagat 
actgaaccaccaattcctggagatttaacagcagcaaattttggaaatgtattacatcac 
ttagataatcagtttacatcagctaacaaacttgcctctgcaattggcgtcgttggtgaa 
gttataacaactatggctattacattagcacgtgaatataagactaagcacgttgtatat 
atcggttcatcatttaataacaatcaattactacgtgaagttgttgaaaattacactgtt 
15 ctaagaggatttaaaccgtactatattgagaatggtgctttttcaggcgctttaggagca 
ctttacctctaa 

Sequence 1018 

MIQGLGYLLSNITDYKELTNLAQNGDRDAIDLKVKHIYKDTEPPIPGDLTAANFGNVLHH 
20 LDNQFTSANKLASAIGVVGEVITTMAITLAREYKTKHVVYIGSSFNNNQLLREVVENYTV 
LRGFKPYYIENGAFSGALGALYL* 

Sequence 1019 

Contig_054 5_pos_553 1_4 67 1 , 

25 putative peptide of unknown function 

atgtttataaaaaaggattttgatgatattacagttcaagtatttgaagaaaaatataga 
gatgcacttaaccaatttgaattaagtgaacgacaacaaatatattcttcattgcctcaa 
actgttttagatgatgcattaaaagatgaaaatcgaattgctaatgtagctttaaataaa 
gaaggaaaagtagtggggttcttcgttttgcatcgttattatcaacatgaaggttatgat 

30 acaccaaacaatgttgtttatgtacgttcattgtcagttaatgaaaagtttcaaggccat 
ggatatgggacaaaaatgatgatgtttttaccagagtatgttcaagcattatttcctgat 
tttacacatttatacttagtagtagacgctgaaaaccaaagtgcttggaacgtttatgaa 
cgtgcaggttttatgcatacagctacaaaagaagaaggacctattgggaaagaaagactt 
tattatttagatttagattcaaaacatgtatcttctttaaggctaaaagagggggaagtc 

35 acatataatgatgatattcacgtgattaatttgcttaaagatgatgtaaaggtaggcttt 
attgcactagaacaaaatgataataaaatgaatatttctgcaatcgaagttaataagaaa 
aataggaatgagggaattgcagaaagtgctttacgccaattaccaacgtatatacgtaaa 
cagtttgaagacattgaagt tttatcaattactttatatggcgaacgtaatgaattaaaa 
ccattgtgcttgaatagtaattttgtagcaatagaggaaactgaggattatacacgtttt 

40 gaaaaatatattaattattaa 

Sequence 1020 

MFIKKDFDDITVQVFEEKYRDALNQFELSERQQIYSSLPQTVLDDALKDENRIANVALNK 
EGKVVGFFVLHRYYQHEGYDTPNNVVYVRSLSVNEKFQGHGYGTKMMMFLPEYVQALFPD 
- 45 FTHLYLWDAENQSAWNVYERAGFMHTATKEEGPIGKERLYYLDLDSKHVSSLRLKEGEV 
TYNDDIHVINLLKDDVKVGFIALEQNDNKMNISAIEVNKKNRNEGIAESALRQLPTYIRK 
QFEDIEVLSITLYGERNELKPLCLNSNFVAIEETEDYTRFEKYINY* 

Sequence 1021 
50 Contig_054 5_pos_4 560_4 02 4 , 

is similar to (with p-value 1.0e-17) 

>sp:sp|P124 64 [RPOEBACSU DNA- DIRECTED RNA POLYMERASE DELTA S 
UBUNIT (EC 2.7.7.6). >pir : pi r | JT0302 IJT0302 DNA-directed RNA 

polymerase (EC 2.7.7.6) delta chain - Bacillus subtilis >gp 
55 :gp|M21677 |BACRPOE_l B. subtilis RNA polymerase delta subunit 

(rpoE) gene, complete cds . NID: g!43455. >gp: gp | Z4 97 82 | BSDN 
A320D_9 B. subtilis chromosomal DNA (region 320-321 degrees). 

NID: g853752. >gp: gp f Z 99123 I BSUB0020_13 Bacillus subtilis c 
omplete genome (section 20 of 21): from 3798401 to 4010550. 
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NID: g2636240. 

atgaaaattcaagattacacaaaagaaatggttgatgagaaatcattcatcgatatggcc 
tatactttattaaatgataaacaaacaacgatgaatttatatgatattattgatgaattt 
aaatctttaggcggatatgagtatgaagatattgaaaatcgaatcgtacaattctatacc 

5 gatttaaacactgatggtcgttttttaaatgtaggagaaaatctttggggtctacgtgat 
tggtactctgtagatgatattgaggaaaaaatcgcaccaacaattcaaaaattcgatatt 
ctagatgacgaagatgaagaagatcaaaaccttaaattattaggtgatgacgacgctgat 
gaagatgacgatattcctgctcaaacagatgatcaagaaacattagacgagtcagataat 
gatgaagatgatgttgaaatgaatgaagcagatatcgttattgatgaagacgaagacgaa 

10 gatattgctgaaggtgaagaagaagcctttgaagacgccgaagactttaatgattaa 

Sequence 1022 

MKIQDYTKEMVDEKSFXDMAYTLLNDKQTTMNLYDI IDEFKSLGGYEYEDIENRIVQFYT 
DLNTDGRFLNVGENLWGLRDWYSVDDIEEKIAPTIQKFDILDDEDEEDQNLKLLGPDDAD 
15 EDDDIPAQTDDQETLDESDNDEDDVEMNEADIVIDEDEDEDIAEGEEEAFEDAEDFND* 

Sequence 1023 

Contig_054 5_pos_3687_2080, 

is similar to (with p-value 0.0e+00) 

20 >sp: sp | PI 324 2 | PYRG_BACSU CTP SYNTHASE (EC 6.3.4.2) ( UTP--AMM 
ONIA LIGASE) (CTP SYNTHETASE). >pir : pir I A32354 | SYBSTP CTP sy 
nthase (EC 6.3.4.2) - Bacillus subtilis >gp: gp | M22039 | BACSPO 
0FA_2 Bacillus subtillis spoOF, CTP synthetase (ctrA) , and f 
ructose-bisphosphate aldolase (orfY-tsr) genes, complete cds 

25 . NID: g46O910. >gp: gp | Z4 9782 I BSDNA320D_10 B. subtilis chromo 
somal DNA (region 320-321 degrees). NID: g853752. >gp:gp|Z99 
123 | BSUB002012 Bacillus subtilis complete genome (section 2 
0 of 21): from 3798401 to 4010550. NID: g2636240. 
atgacaaagtttatttttgtaacaggcggggttgtgtcatcattaggaaaaggaataaca 

30 gccgcttctctaggaagattacttaaagatagaggacttaaagttacaatacaaaaattc 
gatccatatttaaatgtagacccaggcacaatgagtccgtatcaacatggtgaagtgttc 
gttacagacgatggtgctgagactgatttagacttaggacattatgaacgttttatagat 
attaatttaaataaatattcaaatgttactgccggaaaagtatattcacatgtgttgaaa 
aaagaacgccgtggtgattacttgggtggtactgtacaagttattccccatattacaaac 

35 gaaattaaagaaagattgctattagctggtgagagtactaatgcggatgttgtaattact 
gaaattggtggaacaacaggtgatatagagtctttacctttcttggaagccattcgtcaa 
attagaagcgacttaggtcgtgaaaatgtaatgtatgtacattgtactttgctaccatat 
attaaagctgctggggaaatgaaaacaaaacctacacagcacagtgttaaagaattacga 
ggtctaggtattcaacctgatttaatagtagtacgtacagaatacgaaatgacacaagat 

40 ttgaaagacaaaatcgccctattttgtgatatcaaaaaggaaagtgttatagaatgtaga 
gatgcagattctctttatgaaattccgttacaacttagtaagcaaaatatggacgacatt 
gttattcaacgtttacaattaaatgccaagtatgaaacgcaattggatgagtggaaacat 
ctattaaataccgttaataatttagatggtaaaattacaatcggtttagttggtaaatat 
gtgagcttacaagatgcttatctatcagttgttgaatcacttaagcatgctggttatcca 

45 tttaaaaaagacgttgtggtaaaatggattgattcaagtgaggtcaatgatgataatgtt 
gaggcttatttatccgacgttgatggtattttagttcctggtggatttggattcagagca 
agtgaaggtaaaattgcagctattcgttatgcccgtgagaataacataccattctttggc 
atttgtctaggaatgcaattggcaactgttgaatttgcgcgtcatgttttaggctatgaa 
ggtgcgcattcagcagaattagatccaagtacaccatatccaattatagatttattacca 

50 gaacaaaaagatattgaagatttaggtggaaccttaagacttggtctttatccttgccac 
attaaagaaggtacattggcagagaaaatttataataaaaacgatattgaagaacgtcat 
cgtcatagatatgaattcaataacgagtttagggaacaattagaaagtaacggtatggta 
ttttcaggtacaagtccagatggtcgtttagtagaaattattgaaatacctaaaaatgat 
ttctttattgcatgtcaattccatcctgaattcttatcaagacctaatcgtccacagcct 

55 atatttaaatcatttgtagaagcggcgttgaattaccaacaaaaataa 

Sequence 1024 

MTKFIFVTGGVVSSLGKGITAASLGRLLKDRGLKVTIQKFDPYLNVDPGTMSPYQHGEVF 
VTDDGAETDLDLGHYERFIDINLNKYSNVTAGKVYSHVLKKERRGDYLGGTVQVIPHITN 
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EIKERLLLAGESTNADVVITEIGGTTGDIESLPFLEAIRQIRSDLGRENVMYVHCTLLPY 
IKAAGEMKTKPTQHSVKELRGLGIQPDLIVVRTEYEMTQDLKDKIALFCDIKKESVIECR 
DADSLYEIPLQLSKQNMDDIVIQRLQLNAKYETQLDEWKHLLNTVNNLDGKITIGLVGKY 
VSLQDAYLSVVESLKHAGYPFKKDVVVKWIDSSEVNDDNVEAYLSDVDGILVPGGFGFRA 
5 SEGKIAAIRYARENNIPFFGICLGMQLATVEFARHVLGYEGAHSAELDPSTPYPIIDLLP 
EQKDIEDLGGTLRLGLYPCHIKEGTLAEKI YNKNDIEERHRHRYEFNNEFREQLESNGMV 
FSGTSPDGRLVEIIEIPKNDFFIACQFHPEFLSRPNRPQPIFKSFVEAALNYQQK* 

Sequence 1025 
10 Cont ig_05 4 5_pos_l 0 97_2 52 , 

is similar to (with p-value 0.0e+00) 

>sp:sp| P13243|ALF1_BACSU PROBABLE FRUCTOSE-BISPHOSPHATE ALDO 
LASE 1 (EC 4.1.2.13). >pir : pir | S554 26 I D32354 f ructose-bispho 
sphate aldolase (EC 4.1.2.13) - Bacillus subtilis >gp:gp|M22 

15 039|BACSPO0FA_4 Bacillus subtillis spoOF, CTP synthetase (ct 
rA) , and f ructose-bisphosphate aldolase (orfY-tsr) genes, co 
mplete cds . NID: g460910. >gp: gp I 24 9782 I BSDNA320D_13 B.subti 
lis chromosomal DNA (region 320-321 degrees). NID: g853752. 
>gp:gp| Z99122 I BSUB0019_209 Bacillus subtilis complete genome 

20 (section 19 of 21): from 3597091 to 3809700. NID: g2636029. 
>gp:gp| Z99123 | BSUB0020_9 Bacillus subtilis complete genome 
(section 20 of 21): from 3798401 to 4010550. NID: g2636240. 
atgaaagaaatgttaatcgatgcgaaagaaaacggttatgcggttggtcaatacaatctt 
aataacctcgaatttacacaagctattttagaagcgtctcaagaagagaatgcgccagtt 

25 attttaggtgtttctgaaggggcagctcgttatatgagtggtttttatacagttgtgaaa 
atggtagaaggtttaatgcatgacttaaacatcacaatcccagtagcaattcatttagac 
cacggttcaagctttgaaaaatgtaaagaagcaattgatgctggattcacatctgtaatg 
attgatgcatctcatagtccttttgaagaaaatgttgaaatcacttctaaagtagttgag 
tatgctcatgatagaggcgtttctgtagaagctgaattaggtacagttggtggacaagaa 

30 gacgacgtagttgctgatggcgttatctatgcagaccctaaagaatgtcaagaattagta 
gaaaaaactggaattgatactttagctccagcattaggttctgtacatggaccatataaa 
ggtgaacctaaattaggatttaaagagatggaagaaattggtgcttcaactggattacct 
ttagtattacacggtggtacaggtattccaactaaagatattcaaaaagctattccttat 
ggtactgctaaaattaacgtgaatactgaaaatcaaattgcgtctgctaaagcagttcgt 

35 gaagtattaaacaacgacaaagatgtgtatgatccacgtaaatatttaggaccagcacgt 
gaagcaattaaagagacagttaaaggtaaaattagagaattcggtacttctaatcgcgct 
aaataa 

Sequence 1026 

40 MKEMLI DAKENG YAVGQYNLNNLEFTQAILEASQEENAPVI LGVSEGAARYMSG FYTVVK 
MVEGLMHDLNITI PVAI HLDHGSSFEKCKEAI DAGFTSVMI DASHSPFEENVEITSKVVE 
YAHDRGVSVEAELGTVGGQEDDWADGVI YADPKECQELVEKTGIDTLAPALGSVHGPYK 
GEPKLGFKEMEEIGASTGLPLVLHGGTGIPTKDIQKAIPYGTAKINVNTENQIASAKAVR 
EVLNNDKDVYDPRKYLGPAREAIKETVKGKIREFGTSNRAK* 

45 

Sequence 1027 

Cont i g_0 5 4 6_pos_4 3 4 0_4 023, 

is similar to (with p-value 3.0e-30) 

>gp:gpl Z99119 I BSUB0016_180 Bacillus subtilis complete genome 
50 (section 16 of 21): from 2997771 to 3213410. NID: g2635411. 
>gp:gp| U47861 | BSU47861_2 Bacillus subtilis gbsAB operon, gl 
ycine betaine aldehyde dehydrogenase GbsA, alcohol dehydroge 
nase GbsB genes, complete cds. NID: gl524391. 

gtgtcaatttcacgttcccat tttttagtaaaaaagttgcggaagaaagtgaagaaatct 
55 ttctctgcaataaaatgttgctttcggctaccacgcgtgaattgttgtttgacgatatcg 
tattcttgtagtttttttacacctgtactcatactaggtttactcatttgcagttgttgt 
cgcatttcatcaagtgtcatacttccttcaaaaaccataatgccatacaagttacctaca 
ctacggttgataccatacaaatccatggtttcaccgattgagttgataactaaatcttta 
gcttcttcgatatattga 
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Sequence 1028 

VSISRSHFLVKKLRKKVKKSFSAIKCCFRLPRVNCCLTISYSCSFFTPVLILGLLICSCC 
RISSSVILPSKTIMPYKLPTLRLIPYKSMVSPIELITKSLASSIY* 

5 

Sequence 1029 

Con t i g_0 5 4 6_pos_3 5 2 8 _2 0 3 8 , 

is similar to (with p-value 0.0e+00) 

>sp:sp|P71016|DHAB_BACSU BETAINE ALDEHYDE DEHYDROGENASE (EC 
10 1.2.1.8) (BADH) . >gp:gpl Z991 19 | BSUB0016_17 9 Bacillus subtili 

s complete genome (section 16 of 21): from 2997771 to 321341 

0. NID: g2635411. >gp: gp | U47861 | BSU47861_3 Bacillus subtilis 
gbsAB operon, glycine betaine aldehyde dehydrogenase GbsA, 

alcohol dehydrogenase GbsB genes, complete cds. NID: gl524 39 
15 1. 

atggaacttgtagataaattatcaaatcgtcaatatattgatggagaatgggttgaaagt 
tcaaataaaaacacaagagatattataaatccttacaatcaagaaacaatcttcactgta 
gctgaaggaactaaagaagatgttgaaagagcaattttagctgctagaagatctttcgaa 
gacggtgaatggtcacttgaaacaagtgaagtcagaggtaaaaaagtgagagccgttgct 

20 gataaaattaaagaaaatagagaagagttagctaaattagaaacattagacactggtaaa 
actttagaagaatcctatgctgatatggatgatattcataatgtgtttatgtattttgct 
ggtttagctgataaagatggcggtgaaattatcaattcacctattcctaatgctgaaagt 
aaagtagttaaagaacctgtaggtgttgttactcaaattacaccttggaactatccatta 
cttcaagcatcttggaaaattgcgccagctttagcaacaggttgctcattagttatgaaa 

25 ccaagtgaaattactccgttaacaacaattcgtgtatttgaattgatggaggaagttggt 
ttccctaaaggaacaattaatttagtacttggtgctggatcagaagtgggcgacgtgatg 
tcaggtcatgaagaagtcgatttagtttcatttacaggtggtattgaaacaggaaaacac 
atcatgaaacaagcagctaatcacgtgactgacgttgcattagaattaggcggcaaaaat 
cctaatattatttttgatgacgctgattttgaattagctgtagaccaagcacttaatggt 

30 ggatatttccacgctggtcaagtgtgctctgctggttcaagaatcttagttcacaatgat 
attaaagataaattcgaaaaagctcttatcgatcgtgtaagcaaaatcaaattaggtaac 
ggttttgatcaagatactgaaatgggaccagttatctcaacagcacaccgcgataaaatt 
gaaggttatatggaagttgcgaaaaaagatggagcaacaattgcaattggtggtaaacgc 
cctgaacgtgaagacttacaagccggattattctttgaacctactgtaattacagattgt 

35 gatacatcaatgcgtattgttcaagaggaagtctttggaccagttgtgactgtagaagga 
tttgctgacgaagaagaagctattcgcttagcaaatgattcaatttacggtttagcaggt 
gctatatttactaaagatattggtaaagcacaacgtgttgcaaataaattgaaacttggt 
acggtttggattaacgatttccatccatactttgcacaagcgccatggggcggttacaaa 
caatcaggtatcggtagagaattaggtaaagaaggattagaggaatatttagtaagtaaa 

40 cacattcttacaaatactaatccagaaccagtggattggttcagtaaataa 

Sequence 1030 

MELVDKLSNRQYIDGEWVESSNKNTRDI INPYNQETIFTVAEGTKEDVERAILAARRSFE 
DGEWSLETSEVRGKKVRAVADKIKENREELAKLETLDTGKTLEESYADMDDIHNVFMYFA 

45 GLADKDGGEIINSPI PNAESKVVKEPVGVVTQITPWNYPLLQASWKIAPALATGCSLVMK 
PSEITPLTTIRVFELMEEVGFPKGTINLVLGAGSEVGDVMSGHEEVDLVSFTGGIETGKH 
IMKQAANHVTDVALELGGKNPNIIFDDADFELAVDQALNGGYFHAGQVCSAGSRILVHND 
IKDKFEKALIDRVSKIKLGNGFDQDTEMGPVISTAHRDKIEGYMEVAKKDGATIAIGGKR 
PEREDLQAGLFFEPTVITDCDTSMRIVQEEVFGPVVTVEGFADEEEAIRLANDSIYGLAG 

50 AIFTKDIGKAQRVANKLKLGTVWINDFHPYFAQAPWGGYKQSGIGRELGKEGLEEYLVSK 
HILTNTNPEPVDWFSK* 

Sequence 1031 

Contig_054 7_pos_4 95^1253, 
55 is similar to (with p-value 4.0e-46) 

>gp:gp| AF007865 | AF007865_3 Bacillus lichenif ormis bacitracin 
synthetase operon including bacitracin synthetase 1 (bacA) , 
2 (bacB) and 3 (bacC) genes, complete cds. NID: g2982193. 

gtggcatattatgaagcatcgcaattaaaatcaacaggtcaattaaaagatattttaaqt 
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gaaacattacctgaatatatgatacctgtgcattttatgaaggtggatcgtatacctatc 
acgatgaatgggaaattagatgtgcgtgcattacctgaaattaatctaaagaataataga 
aattatgtagaaccacgtaacgatattgaacgcacagtttgccgtattttcgaagagatt 
ttacatgttgatcaggtaggtgttaaagataatttctttgaactaggtggacactctctt 
5 agagcaacattagttgtaaaccgtattgaagaaaggttaaaaaaacgtcttaaagtaggt 
gatttaatgaaatcgcctactgtagagcaacttggacaacaaattgaagaactgcaaaat 
gatgtctatgaagtgattcccaaagcaaatgaatcgtatcaatatgatttaagtgcgtct 
caaaaaagtatgtatcttttatggaaggtcaatcctaaagacacagtgtataacattcca 
ttcttatggagattatcttctgaacttaatgttatgcaattgcaacgtgcattatctaag 
10 ttgattgaacgtcatgaaatattacgaacacaatatgtaattgatgacaatgaagttaaa 
caacgtattgcgacacatgtttcgcctgattt tgaagaggtaaccgacatctctaacgaa 
cgagcaagatattattcaatcatttatggaaccgtttga 

Sequence 1032 

15 VAYYEASQLKSTGQLKDILSETLPEYMIPVHFMKVDRIPITMNGKLDVRALPEINLKNNR 
NYVEPRNDIERTVCRIFEEILHVDQVGVKDNFFELGGHSLRATLVVNRIEERLKKRLKVG 
DLMKSPTVEQLGQQIEELQNDVYEVIPKANESYQYDLSASQKSMYLLWKVNPKDTVYNIP 
FLWRLSSELNVMQLQRALSKLIERHEILRTQYVIDDNEVKQRIATHVSPDFEEVTDISNE 
RARYYSI I YGTV* 

20 

Sequence 1033 

Contig_0547_pos_1273_1977, 

is similar to (with p-value 2.0e-37) 

>gp:gp|AF004835|AF004835_2 Brevibacillus brevis tyrocidine b 

25 iosynthesis operon, tyrocidine synthetase 1 (tycA) , tyrocidi 
ne synthetase 2 (tycB), tyrocidine synthetase 3 (tycC), puta 
tive ABC-transporter TycD (tycD) , putative ABC-transporter T 
ycE (tycE) and putative thioesterase GrsT homolog (tycF) gen 
es, complete cds . NID: g2623770. 

30 atgcgagttaaatatatacatggaccacaacaagattatttatttatggatactcatcat 
agtattaatgatggtatgagtaacacgattttactatctgatttgaacgctttataccaa 
gataaatcattacctgaacttaagcttcagtataaagattatagtgagtggatggtgcac 
agagacttatctaaacaacgtcacttttggttacagcaatttgaaaatcaggttccaata 
ttaaatatgcctacggattatcctagaccaagtat taaaacaaccaacggtaatatgttg 

35 acgtttcattacaatcgtcaaatcaaacagcaattgaaatcttatgtagaacaacatcaa 
gtgacagactttatgttctttgctagtgcaatcatggtattattgcacaaatatacacgt 
caggacgatatcgctattggtagtgtaatcagtgcgcgtactcatcgcgatactgaaaat 
atgttaggtatgtttgctaatacacttgtatatcgtggtcgaccacatgatcaaaagaca 
tgggatcaattgatggctgagatgaaagagatgtgtctaggggcatatgaacatcaagaa 

40 tatccttttgaaagcttagtcatctatgaatgggcctatttctccatccataacagcgtc 
tactttaccagtttcttcattcgtacgatgatctttaatcattga 

Sequence 1034 

MRVKYIHGPQQDYLFMDTHHSINDGMSNTILLSDLNALYQDKSLPELKLQYKDYSEWMVH 
45 RDLSKQRHFWLQQFENQVPILNMPTDYPRPSIKTTNGNMLTFHYNRQIKQQLKSYVEQHQ 
VTDFMFFASAIMVLLHKYTRQDDIAIGSVISARTHRDTENMLGMFANTLVYRGRPHDQKT 
WDQLMAEMKEMCLGAYEHQEYPFESLVI YEWAYFSIHNSVYFTSFFIRTMIFNH* 

Sequence 1035 
50 Cont ig_0 5 4 7_pos_5 4 2 9_3 1 7 1 , 

is similar to (with p-value 0.0e+00) 

>sp:sp|O064 46|SECA__STAAU PREPROTEIN TRANS LOCASE SECA SUBONIT 
. >gp:gp|U97062|SAU97062_l Staphylococcus aureus NCTC 8325 S 
ecA (secA) gene, complete cds. NID: g2078389. 
55 atgggtggtattgctatacataaaggtgatattgcagaaatgagaacaggtgaagggaaa 
acattgactgcaaccatgccgacgtatttgaatgctttagctggtagaggtgtacatgtt 
attacagtcaatgaata tctatcaagttcacaaagtgaagaaatggctgaactatataac 
tatcttggcttaactgtaggtttgaacttaaatagtaagtcaactgaagaaaaacgtgag 
gcttacgcacaagatatcacttatagtacgaataatgaacttgggtttgattatcttaga 
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gataatatggtgaactatgctgaagagagagtaatgcgtcctctacattttgcaattatt 
gatgaggtcgattccatattgatcgacgaagcaagaacacctttaattatttctggtgaa 
gcggaaaaatctacttctttatatactcaagcaaatgtttttgcaaaaatgcttaaagcg 
gaagatgattataattatgatgaaaaaaccaaagctgtacatcttacagaacaaggtgca 
5 gataaagctgaacgtatgttcaaagtagataatctttatgatgttcaaaatgtggaagtg 
at tag teat at taatacagctttaagagc teat gt tact ttgcaacgcgatgttgattac 
atggtcgttgacggtgaagtattaattgttgaccaatttactggacgtacaatgcctgga 
cgtcgtttttctgaaggtttacaccaagcaattgaggctaaagaaggtgtagcaattcaa 
aatgagtctaaaacgatggcatccattactttccaaaactatttcagaatgtataataag 

10 ttagcggggatgactggtacagcgaaaaccgaagaggaagaatttcgtaatatctataat 
atgacagttacccaaattccaacaaacaaacctgttcaacgtaaagataattcagactta 
atttatattagtcaaaaaggaaagtttgatgcggtagttgaagatgttgtagaaaaacat 
aaaaaaggacaacccgtct tact aggtactgttgctgttgagacttctgaata tat t tea 
aatttaetaaaaaaacgtggtgtcagacatgacgtattaaacgctaaaaatcatgaacgc 

15 gaagctgaaatcgtttcaaacgcggggcaaaaaggtgcagttacaattgccacaaatatg 
gctggacgtggaacagatattaaacttggtgatggtgttgaagagttaggtggacttgct 
gttattggtactgagcgtcatgaatcaagacgtattgatgatcaattacgtggacgttca 
ggacgccaaggtgatagaggagatagtcgtttttacctatctttacaagatgaattaatg 
gtacgttttggttcagaacgcttacagaaaatgatgaaccgtttaggaatggatgattca 

20 acgccaatcgagtcgaaaatggtatctcgagctgtagaatcagctcaaaaacgagtagaa 
ggtaataactttgacgcgcgtaaacgtattctagaatacgatgaagttttacgtaagcaa 
cgtgaaattatttataatgagcgtaatgaaatcattgatagtgaagaaagttctcaagtc 
gttaacgcgatgttacgttctacattgcaacgtgcgattaatcattttattaatgaagaa 
gacgataatcctgactacacgccatttatcaattacgttaatgatgtgttcttgcaagaa 

25 ggagatcttcaagatacagaaattaaaggtaaagattcagaagatatttttgaaattgta 
tggtctaaaattgaaaaagcatatgcacagcaacaagaaacattaggagaccaaatgagt 
gaatttgagcggatgattttattacgttcaattgatacacattggactgatcatattgat 
acgatggatcaattgcgtcaaggtattcatttacgttcatatgcacaacaaaatccactt 
cgtgattatcaaaatgaaggtcatgaattatt tgatatcatgatgcaaaatatcgaggaa 

30 gatacatgtaagtatatcttgaaatcagtggttcagtttgaagatgatgtagaacgtgaa 
aaatctaaaagctttggtgaagcaaaacatgtaactgctgaagatggcaaagaaaaagca 
aagccccaaccgattgtaaaaggtgatcaggtaggtagaaatgatccatgcccatgtggt 
agtggtaaaaaatataaaaattgtcatgggaaagcgtaa 

35 Sequence 1036 

MGGIAIHKGDIAEMRTGEGKTLTATMPTYLNALAGRGVHVITVNEYLSSSQSEEMAELYN 
YLGLTVGLNLNSKSTEEKREAYAQDITYSTNNELGFDYLRDNMVNYAEERVMRPLHFAII 
DEVDSILIDEARTPLIISGEAEKSTSLYTQANVFAKMLKAEDDYNYDEKTKAVHLTEQGA 
DKAERMFKVDNLYDVQNVEVISHINTALRAHVTLQRDVDYMVVDGEVLIVDQFTGRTMPG 

40 RRFSEGLHQAIEAKEGVAIQNESKTMASITFQNYFRMYNKLAGMTGTAKTEEEEFRNIYN 
MTVTQI PTNKPVQRKDNS DLI YI SQKGKFDAVVEDVVEKHKKGQPVLLGTVAVETSE YI S 
NLLKKRGVRHDVLNAKNHEREAEI VSNAGQKGAVTIATNMAGRGTDIKLGDGVEELGGLA 
VIGTERHESRRIDDQLRGRSGRQGDRGDSRFYLSLQDELMVRFGSERLQKMMNRLGMDDS 
TPIESKMVSRAVESAQKRVEGNNFDARKRILEYDEVLRKQREIIYNERNEIIDSEESSQV 

45 VNAMLRSTLQRAINHFINEEDDNPDYTPFINYVNDVFLQEGDLQDTEIKGKDSEDIFEIV 
WSKIEKAYAQQQETLGDQMSEFERMILLRSIDTHWTDHIDTMDQLRQGIHLRSYAQQNPL 
RDYQNEGHELFDIMMQNIEEDTCKYILKSWQFEDDVEREKSKSFGEAKHVTAEDGKEKA 
KPQPI VKGDQVGRNDPCPCGSGKKYKNCHGKA* 

50 Sequence 1037 

Contig_0547_pos_2539_1892, 

is similar to (with p-value 1.0e-78) 

>gp:gp| Z99122 |BSUB0019_26 Bacillus subtilis complete genome 
(section 19 of 21): from 3597091 to 3809700. NID: g2636029. 
55 >gp:gp| AF013188 I AF013188_1 Bacillus subtilis release factor 
2 (prfB) gene, complete cds . NID: g2331286. >gp: gp | AF017 113 1 
AF017113_1 Bacillus subtilis 300-304 degree genomic sequence 
. NID: g2618830. 

atgcttcttaggatgtatcaacgttactgtgaacaaaatggctttaaagttgaaacggtt 
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gattatcttccaggagatgaagcaggcgttaaaagtgtcacattacttattaaaggacat 
aatgcttatggttatttaaaggcagaaaaaggtgttcatcgtttagttagaatttcacct 
ttcgattcatctggtagacgccatacttcttttgcatcatgtgatgttattcctgatttt 
aataatgatgaaattgaaatcgagattaacccagatgatatcacagtggatacttttaga 
5 gcttcaggcgctggtggacaacatattaacaaaactgagtctgcaattagaattacacat 
caccctacaggtattgtagtcaacaaccaaaatgaacgatctcaaataaaaaatagagaa 
gctgcaatgaaaatgttgaagtccaaactttatcaattaaagttagaagagcaagagcaa 
gaaatggctgaaattcgaggcgaacaaaaagacattggatggggaagtcagattcgttct 
tacgtctttcatccatattcaatgattaaagatcatcgtacgaatgaagaaactggtaaa 
10 gtagacgctgttatggatggagaaataggcccattcatagatgactaa 

Sequence 1038 

MLLRMYQRYCEQNGFKVETVDYLPGDEAGVKSVTLLIKGHNAYGYLKAEKGVHRLVRISP 
FDSSGRRHTSFASCDVIPDFNNDEIEIEINPDDITVDTFRASGAGGQHINKTESAIRITH 
15 HPTGIVVNNQNERSQIKNREAAMKMLKSKLYQLKLEEQEQEMAEIRGEQKDIGWGSQIRS 
YVFHPYSMI KDHRTNEETGKVDAVMDGEIGPFI DD* 

Sequence 1039 

Cont ig_05 4 8_pos_8 82_1 8 41, 

20 is similar to (with p-value 3.0e-70) 

>sp: sp| P31114 |GRC3_BACSU PROBABLE HEPTAPRENYL DIPHOSPHATE SY 
NTH AS E COMPONENT II (EC 2.5.1.30) (HEPPP SYNTHASE) (SPORE GE 
RM I NAT I ON PROTEIN C3) . >gp : gp | M8024 5 | BACVARGNS_5 B.subtilis 
dbpA, mtr(A,B), gerC(l-3), ndk, cheR, aro ( B, E, F, H) , trp(A-F) 

25 , hisH, and tyrA genes, complete cds. NID: gl43798. >gp:gp|Z 
99115|BSUB0012_214 Bacillus subtilis complete genome (sectio 
n 12 of 21): from 2195541 to 2409220. NID: g2634478. 
gtggcaaagttaaacattaacaacgaaataaagaaagtagaaaagcgacttgaagaagca 
attataagttctgatcaaacat tacaagaagcctcattccatttactatcttcaggggga 

30 aaaagagttagacccgcttttgttattttaagtggtcaatttggctctaacaacaaacct 
tcagaagacacgtatcgtgtagcagtagctttagaactaattcacatggctaccttagtc 
cacgatgatgtgatagataaaagtgataaacgtagaggccgactcactatttcaaaaaaa 
tgggaccaaagtacagctattttaacaggaaatttcttacttgctatggggctcaagcat 
ttatctgaaatcagtgatactcgtgtccattcgaccatttctaaatcaattgttgatgtg 

35 tgtagaggagaactattccaatttcaagatcaatttaatagcaatcaaacaattactaat 
tacttacgtcgtatcaaccgtaaaacagcacttcttattcaactgtctacacaagttggt 
gcgattacttccaatgcgtcaaatgacgttattcgtaaattaaaaatgatcggacattat 
ataggtatgagtttccaaataatagatgatgtgctagattttactagttctgaaaagaaa 
cttggtaagccggttggtagtgaccttatgaatggtcatattacattacctgtacta Lta 

40 gaaatgcgaaaaaataagacttttaaagataaaatttcacaacttaatcctgacagtcct 
caacatgcctttgaaacttgtataacaataattagacagtccgaaagcatagaacaatca 
aaacaaataagtgaaaagtatttaaataaagcaatcaatttaatcgatgaattagaggat 
ggtcctaataaagaactatttagaaagcttattaaaaaaatgggaagtcgaaataagtaa 

45 

Sequence 1040 

VAKLNINNEIKKVEKRLEEAIISSDQTLQEASFHLLSSGGKRVRPAFVILSGQFGSNNKP 
SEDTYRVAVALELIHMATLVHDDVIDKS DKRRGRLTISKKWDQSTAILTGNFLLAMGLKH 
LSEISDTRVHSTISKSIVDVCRGELFQFQDQFNSNQTITNYLRRINRKTALLIQLSTQVG 
50 AITSNASNDVIRKLKMIGHYIGMSFQIIDDVLDFTSSEKKLGKPVGSDLMNGHITLPVLL 
EMRKNKTFKDKISQLNPDSPQHAFETCITIIRQSESIEQSKQISEKYLNKAINLI DELED 
GPNKELFRKLIKKMGSRNK* 

Sequence 1041 
55 Contig_054 9_pos_673_1410, 

is similar to (with p-value 5.0e-94) 

>sp:sp|Q0214 2|LEU2_LACLA 3-ISOPROPYLMALATE DEHYDRATASE (EC 4 
.2.1.33) (ISOPROPYLMALATE ISOMERASE) (ALPHA- I PM ISOMERASE) ( 
IPMI). >pir:pir IS35134 IS35134 probable 3-isopropylmalate deh 
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ydratase (EC 4.2.1.33) chain leuC - Lactococcus lactis subsp 
. lactis >gp:gp|U92974 |LLU92974_16 Lactococcus lactis unknow 
n gene, partial cds, and HisC (hisC) , unknown, HisG {hisG) , 
unknown, HisB (hisB) , unknown, HisH (hish), HisA (hisA) , His 
5 F (hisF) , HisIE (hisIE) , unknown, unknown, LeuA (leuA) , LeuB 
(leuB), LeuC (leuC) , LeuD (leuD), unknown, IlvD (ilvD) , Ilv 
B (ilvB), IlvN, IlvC (ilvC), IlvA (ilvA) , AldB (aldB) and al 
dR (aldR) genes, complete cds. NID: g2565137. 
atggaagcacgtatgacgatttgtaatatggctattgaagcaggagcaaagtatggttta 

10 atgcaacctgatgaaacaacctttaattacgtaaaaggtcgtccttatgctactgatttt 
gatagttctatggcgtggtggaaagaactttattctgatgatgatgcctattttgataaa 
gttattgaacttgatgtaacaaatttagaacctcaagtaacttggggaactaacccagaa 
atgggagttagttttagtaatccattcccagaaattaaaaatgcaaatgaccaacgtgct 
tatgactatatgggacttcacccaggtcaaaaagccgaagatataaaattaggttatgtt 

15 tttttaggttcatgcacgaatgcaagattatctgatcttattgaagcaagtcatattatt 
aaaggacaacaagttcatccaaatattactgctattgtggttccgggttcaagaactgtt 
aagaaggaagctgaagctctgggactagataaattatttaaagatgctggatttgagtgg 
cgtgaaccaggatgttctatgtgcttaggtatgaatccagatcaagttcctgaaggagta 
cattgtgcatccacgagtaatcgcaattttgaaggaagacaaggcaaaggcgctcgt aca 

20 catttggtatcccctgctatggctgctgctgctgcgattaatggtaaattcattgatgtt 
agaaaggtggtagtataa 

Sequence 1042 

MEARMTICNMAIEAGAKYGLMQPDETTFNYVKGRPYATDFDSSMAWWKELYSDDDAYFDK 
25 VIELDVTNLEPQVTWGTNPEMGVSFSNPFPEIKNANDQRAYDYMGLHPGQKAEDIKLGYV 
FLGSCTNARLSDLIEASHI IKGQQVHPNITAIVVPGSRTVKKEAEALGLDKLFKDAGFEW 
REPGCSMCLGMNPDQVPEGVHCASTSNRNFEGRQGKGARTHLVSPAMAAAAAINGKFIDV 
RKVW* 

30 Sequence 104 3 

Contig_054 9_pos_14 98_1 980, 

is similar to (with p-value 6.0e-48) 

>sp: Sp | Q0214 4 | LEUD_LACLA 3-1 SOPROPYLMALATE DEHYDRATASE SMALL 
SUBUNIT (EC 4.2.1.33) { I SOPROPYLMALATE ISOMERASE) (ALPHA-IP 

35 M ISOMERASE) . >pir : pir | E36889 I E36889 probable 3-isopropylmal 
ate dehydratase (EC 4.2.1.33) chain leuD - Lactococcus lacti 
s subsp. lactis >gp: gp I U92974 | LLU92974_17 Lactococcus lactis 
unknown gene, partial cds, and HisC (hisC), unknown, HisG ( 
hisG) , unknown, HisB (hisB) , unknown, HisH (hish), HisA (his 

40 A), HisF (hisF), HisIE (hisIE) , unknown, unknown, LeuA (leuA 
), LeuB (leuB), LeuC (leuC) , LeuD (leuD) , unknown, IlvD (ilv 
D), IlvB (ilvB), IlvN, IlvC (ilvC), IlvA (ilvA) , AldB (aldB) 

and aldR (aldR) genes, complete cds. NID: g2565137. 
gtgcatcttaagcgggtctctaaatcaggctttggaccttttgcttttgatgaatggcgt 

45 tacttacctgatggtagtgataatcctgattttaatcctaataaaccaaaatatcatggt 
gcgtcaattctaattactggagataactttggttgtggttctagccgtgagcatgcagcg 
tgggccttaaaagattatggttttaacattattattgcaggaagttttagtgacatcttt 
tacatgaattgtactaaaaacgcaatgttacctatatgtttaaatcagaaagaaagagaa 
catttagctcaatttgatgaaataactgttgatttacctaatcaaacagtgtctacggtg 

50 tctcagtcttttcattttgatatagatgaaacctggaaaaataaattaatccatggctta 
gacgatattgctattactttacaatttgaaaatttaatagaaaaatacgaaaaaactttt 
taa 

Sequence 1044 

55 VHLKRVSKSGFGPFAFDEWRYLPDGSDNPDFNPNKPKYHGASILITGDNFGCGSSREHAA 
WALKDYGFNIIIAGSFSDIFYMNCTKNAMLPICLNQKEREHLAQFDEITVDLPNQTVSTV 
SQSFHFDIDETWKNKLIHGLDDIAITLQFENLIEKYEKTF* 

Sequence 1045 
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Contig_054 9_posJL989_3263, 

is similar to {with p-value 0.0e+00) 

>gp: gp I U92974 j LLU92974_23 Lactococcus lactis unknown gene, p 
artial cds, and HisC (hisC) , unknown, HisG (hisG) , unknown, 
5 HisB (hisB), unknown, HisH (hish) , HisA (hisA) , HisF (hisF) , 
HisIE (hisIE), unknown, unknown, LeuA (leuA) , LeuB (leuB) , 
LeuC (leuC), LeuD (leuD), unknown, IlvD (ilvD) , IivB (ilvB) , 
IlvN, IivC (ilvC), IlvA (iivA), AldB (aldB) and aldR (aldR) 
genes, complete cds. NID: g2565137. 

10 atgacagtgacagtaagaactaaagtttcgacaaaagatatagatgaagcatatttacgt 
ctaaaaaatatagtaaaagaaactcccttacaattcgaccattacttatctcaaaaatat 
aattgtaatgtttatttaaaaagagaagatttacagtgggtacgatcctttaaattaaga 
ggagcttataatgctatttcagtattatccaatgaagaaaaaaataaaggtattacttgc 
gcaagtgctggaaatcatgctcaaggtgttgcttatactgccaaaaaactcaatttaaaa 

15 gctgttattttcatgccagtaactacaccacgacaaaaaatcaatcaagtcaaattcttc 
ggggatagtaacgtagaaatagtattaattggcgatacatttgatcactgcttagcacaa 
gctttaaactatacgaagcaacataaaatgaattttattgacccatttaataatgtatat 
actattgcaggacaaggcactttagctaaggaaatattaaatcaagctgaaaaagaggat 
aaaacatttgattatgtatttgctgctataggtggtggcggtcttatttcaggagtgagc 

20 acatat tttaaagcacattccccccatactaaaattattggtgttgaaccaaccggtgcc 
agtagtatgtatcaatcagtcgttatcaaccatagtatagttactttagaaaatattgat 
aagtttgttgatggagcttcagtagcaagagttggtgatattacctttgatattgcgaaa 
gataaagtggatgattatgttcaagttgacgaaggagctgtttgctccacaattctggat 
atgtactctaaacaagcgattgttgctgaaccagctggtgctttaagtgtaagtgcctta 

25 gaacaatataaaaagcagat tgaaaataaaactattgtatgcatagtaagtggaggcaac 
aatgatattaatcgaatgaaagaaattgaggagcgttcccttctatttgaagaaatgaaa 
cattactttattttaaatttcccacaaagacctggtgctttaagagaatttgtcaatgat 
gtcctcggacctcaagacgatattacaaaatttgaatatttaaagaaaacatcacaaaac 
actggaactgttattataggtatacagctgaaacatcatgatgatctcattcagttaaaa 

30 gatcgcgtatgtcaatttgatccttctaatatttatatcaatgaaaataaaatgttatat 
teat tact tatttaa 

Sequence 1046 

MTVTVRTKVSTKDIDEAYLRLECNIVKETPLQFDHYLSQKYNCNVYLKREDLQWVRSFKLR 
35 GAYNAISVLSNEEKNKGITCASAGNHAQGVAYTAKKLNLKAVIFMPVTTPRQKINQVKFE 
GDSNVEIVLIGDTFDHCLAQALNYTKQHKMNFIDPFNNVYTIAGQGTLAKEILNQAEKED 
KTFDYVFMIGGGGLISGVSTYFKAHSPHTKIIGVEPTGASSMYQSVVINHSIVTLENID 
KFVDGAS VARVGDI T FDI AKDKVDD YVQV DEGAVCSTI LDMYSKQAI VAE PAGALS VSAL* 
EQYKKQIENKTIVCI VSGGNNDINRMKEIEERSLLFEEMKHYFILNFPQRPGALREFVND 
40 VLGPQDDITKFEYLKKTSQNTGTVI IGIQLKHHDDLIQLKDRVCQFDPSNIYINENKMLY 
SLLI* 

Sequence 1047 
Contig_0550_pos_4 007_5077, 

45 putative peptide of unknown function 

atgaatttaagatcactagatacaaaagtagaggataataacactttatctgatgataag 
aaacaagcgcttaaacaagaaattgataagactaagcaaagtattgaccgacaaagaaat 
attattatagatcaactcaatggtgctagtaataaaaaacaagcaaccgaagatatctta 
aatagtgttttcagcaaaaatgaagtagaagacataatgaaacgtattaaaacaaatggc 

50 cgaagtaatgaagatattgctaatcaaattgccaagcaaattgatggtcttgcattaact 
tctagtgatgatattttaaaatcaatgttagatcaatctaaagataaagaaagtttaatt 
aaacaattgttgacgacacgacttggtaatgatgaagcagatcgtattgctaaaaaattg 
ttaagccaaaacttgtcgaattctcaaatcgtagaacaattaaaacgtcatttcaatagt 
caaggaacagctacagctgatgatatattgaatggtgtgattaatgatgctaaagacaaa 

55 agacaagcgattgaaacaatattacaaacccgtatcaataaagacaaagctaaaattatc 
gctgatgttattgcgcgtgtacaaaaggacaaatcagatatcatggatctcattcactct 
gcgattgaaggcaaggcaaatgatttattagatatagaaaaacgagcaaaacaagctaag 
aaagatttagaatatattttagatcctataaagaatagaccatccttgt tagategtatt 
aacaaaggtgtcggtgattctaattcaatatttgatagaccaagtttacttgataaactt 
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cactcaagaggatctattcttgataaattagatcattcggcaccggagaatggattatct 
ttagataataaaggtggccttttaagtgatctatttgacgacgatggtaatatctcatta 
ccagcgacaggtgaagtcatcaaacaacattggataccagtggctgttgtactcatgtca 
ttaggtggagcgctcatctttatggcgcgtagaaaaaaacaccaaaat taa 

5 

Sequence 104 8 

MNLRSLDTKVEDNNTLSDDKKQALKQEIDKTKQSIDRQRNI II DQLNGASNKKQATEDIL 
NSVFSKNEVEDIMKRIKTNGRSNEDIANQIAKQIDGLALTSSDDILKSMLDQSKDKESLI 
KQLLTTRLGNDEADRIAKKLLSQNLSNSQIVEQLKRHFN5QGTATADDILNGVINDAKDK 
10 RQAIETILQTRINKDKAKIIADVIARVQKDKSDIMDLIHSAIEGKANDLLDIEKRAKQAK 
KDLEYILDPIKNRPSLLDRINKGVGDSNSIFDRPSLLDKLHSRGSILDKLDHSAPENGLS 
LDNKGGLLSDLFDDDGNISLPATGEVIKQHWIPVAWLMSLGGALI FMARRKKHQN* 

Sequence 1049 
15 Contig_0550_pos_7700_10084 , 

is similar to (with p-value 5.0e-70) 

>sp:sp|P4 9022|PIP_LACLA PHAGE INFECTION PROTEIN . >gp:gp|L146 
79|LACPIP_1 Lactococcus lactis pip and gerC2 genes, complete 
cds's, and rrg gene, 5* end of cds . NID: g308860. 

20 atgaaaaacgcactaaaactttttatcacggatttaaaaagagttgctaaaacaccaggt 
gtatgggtcatcttagctggtttagcaattcttccttcattctatgcatggtttaacctc 
tgggctatgtgggatccgtatggtcatacaggacatatcaaagttgccgtagtgaatgaa 
gaccaaggtgaaaaagttcgtggtaagaatattaatgtaggaaataaaatggtcaaaact 
ttaaaaaagaatgatagttttgactggcaatttgtgagtagagaaaaagccgaccatgaa 

25 attaagatgggaaaatattatgcaggtatttatataccgaagaaattcacacatgaaatc 
actggtactttaagaaaacatcctcaaaaggcggatatagattttaaagtaaatcagaag 
attaatgctgtagcagctaagttaaccgatacgggatcgtcgtttgtgattgataaagca 
aataaacaatttaacaaaaccgtagcaaccgctttactttctgaagctaataaagtcgga 
ctatcaattgaagataatgtacctacaatcaataaaattaagagtgctgtatatcaagct 

30 aataattcattgcctaaaattaatcaatttgcagacaagattattgaactaaataaacat 
caagacgatttggatgcttatgctaatcaatttagaagtttaggaaagtataaagggaat 
gtattagacgctcaagaaaaacttaatgctgttaattcgtctattccggcgcttaatgaa 
agggctaaattgatacttgcacttgatagctacatgcctaatattgaaagaattttaaat 
gttgctgctaatgatgttccagcacaatttcctagaattaataggggtgtcgatattgca 

35 agtgaaggtattgatgcagcgagtggtcagttaaatgatgcaaaaggttatttgactcaa 
gctaaagcgagagtgggagactatcaagaagcagctggccgcgctcaagatgtgaacaac 
caagcaaatcaaaatctaagaaatcaaacatcaactacaccccaaagcgctataaaatca 
tcgcattcggaagggaagagtcattcaagcattaaaacagtacctgtgagtcaatcaggt 
gagaatcaacccgtttatggtgataacattttatctaacagtgatgtaaaatcaatgaat 

40 acagctttaacagaagctttattatcattatctaatcaaacagatcaacaagcacaagct 
acccaacaagacattaagtcattaaaaaatatagcatatggtgttatcgcttcagataaa 
ccatcagagtttaaagaaccattaaaaaatataaaatcacgcttagaaaacgcatctaag 
tataatcaacaatttatagatatcttgtcagagttggaaaaaagtgaacatgttgatcta 
tctaatgaaattaagcaagtgaaagaagcaaacaatagcattaatgataatttaaaaagt 

45 actaatcaattaatagatgcattgtcaaatggtagctccggacaattagaagcagtcaat 
gtattacgtgacttacctaacttaaataaaaggttagatacattacgaaattacattaaa 
aaagaacttaatcgtaatttactagctgtttctaatgagattactgatcaacttaataaa 
ggtcaaaatacattatcgacaatccaatctaaattaaatactattaaccgagtgattaac 
gctggtcaagatattt taaatagcggtaaaaagagaattgatacgattcaaactgcat tg 

50 ccagcaatcgaaaacgcatatataaatgcaatgcgaactgcacaagcttacttcccaaca 
gctaaaaaagatgtcgcgaaagctgcagactttgtacgtaatgacttgcctggattagag 
agagaattagctaatgtaacacagtctgtaaaccaaaaaataccatctttatttagtcgt 
tatgataatgctgtagatttattaaacgagaaacagcctcaagcaaaagaagcacttgct 
tcgcttgccgatttctcagaaaataaattgccagatgttgagaaagact tgaaaaaagca 

55 aataaaatcttcaaaaagttagataaagatgatgctgtagataagctaatagatacattg 
aaaaatgatttgaagaaacaggcagatattgttgctaaccctattaataaaaaaacgaca 
gatgtgttcccagtaaaagactatggttctggtatgacgccgttctatactgcattgtct 
atttgggttggaggattattaatggtcagcttattatccgatgtttccttttccgacttg 
ataaccgtacaagccatctgtcttacttgttttgtttttagataa 
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Sequence 1050 

MKNALKLFITDLKRVAKT PGVWV I LAGLAILPSFYAWFNLWAMWDPYGHTGH I KVAVVNE 
DQGEKVRGKNINVGNKMVKTLKKNDSFDWQFVSREKADHEIKMGKYYAGIYI PKKFTHEI 
5 TGTLRKHPQKADIDFKVNQKINAVAAKLTDTGSSFVIDKANKQFNKTVATALLSEANKVG 
LSIEDNVPTINKIKSAVYQANNSLPKINQFADKIIELNKHQDDLDAYANQFRSLGKYKGN 
VLDAQEKLNAVNSSIPALNERAKLILALDSYMPNIERILNVAANDVPAQFPRINRGVDIA 
SEGIDAASGQLNDAKGYLTQAKARVGDYQEAAGRAQDVNNQANQNLRNQTSTTPQSAIKS 
SHSEGKSHSSIKTVPVSQSGENQPVYGDNILSNSDVKSMNTALTEALLSLSNQTDQQAQA 

10 TQQDIKSLKNIAYGVIASDKPSEFKEPLKNIKSRLENASKYNQQFIDILSELEKSEHVDL 
SNEIKQVKEANNSINDNLKSTNQLIDALSNGSSGQLEAVNVLRDLPNLNKRLDTLRNYIK 
KELNRNLLAVSNEITDQLNKGQNTLSTIQSKLNTINRVINAGQDILNSGKKRIDTIQTAL 
PAIENAYINAMRTAQAYFPTAKKDVAKAADFVRMDLPGLERELANVTQSVNQKIPSLFSR 
YDNAVDLLNEKQPQAKEALASLADFSENKLPDVEKDLKKANKIFKKLDKDDAVDKLIDTL 

15 KNDLKKQADIVANPINKKTTDVFPVKDYGSGMTPFYTALSIWVGGLLMVSLLSDVSFSDL 
ITVQAICLTCFVFR* 

Sequence 1051 
Contig_0550_pos_7412_54 45, 

20 is similar to (with p-value 6.0e-34) 

>sp:sp|P37710|ALYS_ENTFA AUTOLYSIN (EC 3.5.1.28) (N-ACETYLMU 
RAMOYL-L- ALANINE AMIDASE) . >pir : pir | A38109 | A38109 autolysin 
- Enterococcus faecalis >gp : gp I M58002 | STRHYDROLA_l Streptoco 
ecus faecalis bacterial cell wall hydrolase gene, complete c 

25 ds. NID: gl53658. 

atgaagaaaaataaatttttagtatatttactatcgacggcgcttatcacgccaaccttc 
gctacacaaacagcttttgctgaagattcatctaataaaaatacaaattcagataaaatg 
gaacaacatcaatcacaaaaagaaacatcaaaacaatctgaaaaagatgaatttaacaac 
gatgattctaaacacgattctgatgataaaaaaagcacttctgacagcaaggacaaagac 

30 tctaataaaccattatcagctgactcaacacatcgtaactataaaatgaaagatgataat 
ttagttgatcaactt tatgat aattttaagtctcagtcagtagatttttctaaatactgg 
gaaccgaataaatacgaagacagttttagtttaacgtcactcatacaaaatttatttgat 
tttgattctgatataacagattacgaacagccacaaaagacaagccattcttctaatgac 
gaaaaagatcaagtagaccaagcagatcaggcaaaacaaccatcacaacatcaagaacaa 

35 tcacagtcgtctgctaaacaagatcaagaatcatcaaacgatgaaaaagaaaagacaact 
aaccatcaagccgattctgacgtcagtgatttacttggagaaatggataaagaagatcaa 
gaaggcgaaaacgtagatacaaacaaaaatcaatcttcttctgagcaacaacaaactcaa 
gcgaatgatgatagctcagaacgtaacaagaaatattctagtattacagattcagcatta 
gactctatattagatgaatatagtcaggacgctaagaaaacagaaaaagattacaataag 

40 agcaagaatacaagtcacactaaaacatctcaaagtgataatgccgacaagaatccacaa 
ttaccaacagatgatgaattaaaacatcaatcaaaacctgcacaatcatttgaggatgac 
attaaacgctcaaatacacgttcaacaagtcttttccaacaactacctgaattagacaat 
ggtgacttatcttctgattcatttaatgttgttgacagtcaagacacacgtgatttcatt 
caatcaattgctaaagatgcgcatcagattggaaaagaccaagatatatatgcatcagtt 

45 at gat tgctcaagctattttagaatctgactctggaaaaagttcacttgcacaat caeca 
aatcataacttgtttggaatcaaaggtgactacaaaggacaatctgtaacttttaatact 
ttagaagctgatagcagtaatcatatgttcagtatccaagcaggtttccgtaaataccca 
agtactaaacaatctcttgaagattatgcagatttaatcaaacatggtatcgatggtaat 
ccgtcaatttataaaccaacttggaagagtgaagctctatcatataaagatgctacttca 

50 catctgtcacgctcatacgccacagatcctaattattctaaaaaattaaatagtattatt 
aaacattatcatttaacatcttttgacaaagaaaaaatgcctaacatgaagaaatacaac 
aaatcaataggtacggatgtctctggtaatgacttcaaaccatttactgaaacttccggt 
acatcaccttacccacatggccaatgtacttggtatgtgtaccaccgtatgaatcaattt 
gatgcatccatttctggtgacttaggtgatgctcataattggaataatcgtgctgaaagt 

55 gaaggctatacggtaacgcacacacctaaaaatcatactgcagttgtgtttgaagctgga 
caattaggtgctgatacacagtatggtcatgttgctttcgttgaaaaagttaatgacgac 
ggttcaattgttatttctgaatcaaatgttaaaggattaggtgtcatttcattcagaact 
attgatgcagaagatgctcaagatttagattacattaaaggtaaatag 
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Sequence 1052 

MKKNKFLVYLLSTALITPTFATQTAFAEDSSNKNTNSDKMEQHQSQKETSKQSEKDEFNN 
DDSKHDSDDKKSTSDSKDKDSNKPLSADSTHRNYKMKDDNLVDQLYDNFKSQSVDFSKYW 
EPNKYEDSFSLTSLIQNLFDFDSDITDYEQPQKTSHSSNDEKDQVDQADQAKQPSQHQEQ 
5 SQSSAKQDQESSNDEKEKTTNHQADSDVSDLLGEMDKEDQEGENVDTNKNQSSSEQQQTQ 
ANDDSSERNKKYSSITDSALDSILDEYSQDAKKTEKDYNKSKNTSHTKTSQSDNADKNPQ 
LPTDDELKHQSKPAQSFEDDIKRSNTRSTSLFQQLPELDNGDLSSDSFNVVDSQDTRDFI 
QSIAKDAHQIGKDQDI YASVMIAQAILESDSGKSSLAQSPNHNLFGIKGDYKGQSVT FNT 
LEADSSNHMFSIQAGFRKYPSTKQSLEDYADLIKHGIDGNPSIYKPTWKSEALSYKDATS 
10 HLSRSYATDPNYSKKLNSIIKHYHLTSFDKEKMPNMKKYNKSIGTDVSGNDFKPFTETSG 
TSPYPHGQCTWYVYHRMNQFDASISGDLGDAHNWNNRAESEGYTVTHTPKNHTAVVFEAG 
QLGADTQYGHVAFVEKVNDDGSIVISESNVKGLGVISFRTIDAEDAQDLDYIKGK* 

Sequence 1053 
15 Contig_0553_pos_3228_3920, 

putative peptide of unknown function 

atgcttaaaatagagagattaaccaaatatatagacacgcaactgatatttaaagagata 
tcatgtacaattaacgaccagcacttactcataagtggggagagtggttgtggtaaatcc 
acattagccaagattatcgctggcttagatacggattatcagggcgaattatatcttaat 

20 gggcgcttacgtgaatcttatacgtctaaagagtggatgaagcacatccaatatgtacct 
caatatcaacgtgatactttaaatcagcgtaaaacggtattagctacattattagaacca 
cttaagaattataaggtaaataaacagcgttatacatcaagcattgaagcagtgcttgat 
cagtgtaatttaccacacgatatacttaatcataaagtttcgacattaagtggtggccaa 
tttcaacgcgtctggatagctaaagctttaatattagaaccagagattctcatattggat 

25 gaagctacaaccaacttagatgtcattaatgaagaagctatacttcaaatgttgatttcc 
tt aaagatgacacaattaa teat tat ttcacatgatacatacgtcttaagccaatttgaa 
ggaattcatgactatcagtcgattccttttggttcttacgcaataagtcttcatctaatt 
ctgttaagggtacgagtgaaccatatcgtttaa 

30 Sequence 1054 

MLKIERLTKYIDTQLI FKEI SCTINDQHLLISGESGCGKSTLAKII AGLDTDYQGELYLN 
GRLRESYTSKEWMKHIQYVPQYQRDTLNQRKTVLATLLEPLKNYKVNKQRYTSSIEAVLD 
QCNLPHDILNHKVSTLSGGQFQRVWIAKALILEPEILILDEATTNLDVINEEAILQMLIS 
LKMTQLIIISHDTYVLSQFEGIHDYQSIPFGSYAISLHLILLRVRVNHIV* 

35 

Sequence 1055 

Contig_0553_pos_5708_6910, 

is similar to (with p-value 0.0e+00) 

>gp:gp|AB009635|AB009635_l Staphylococcus aureus DNA for Fmt 

40 , complete cds. NID: g2696795. 

atgaaattaaataaatttaaaatcgtattttttattatcgtattattaactttagtcgtg 
tcaataggtatattaggtgttgaatggacaagacacctagaattaaaaaaacaaacgtta 
agtcaagaaagtggaaatacgaattatatagaaaagagagataagactgttgagaaacct 
aaaaaaataaagactaaatatgataaaaaagatcctacttccaaatcgataaacaaatat 

45 ttagaaaaaactcaatttaatggaactgtagctgtatttgataatggaaaagttaaaatg 
aataaaggatatggttatcaagatatagagaaaggcaaaaagaacactgcaaatacaatg 
tatttaataggatcagcgcaaaaatttacaacaggtttaatgctgaagcaacttgaagtc 
gaaaataaagtgaatttgcaagattcagtcactaaatatattccttggtttaaaacaaat 
aaagaaattacaattaaagatttaatgttacataaaagtggactatataaatatgaagct 

50 tcaactaatatcaaaaatttagaacaggctgttagagcaattcaagctcgaggtattgat 
gatacagtttatcataagcatcaatataatgatgctaattatttagttttagctaaagtt 
attgaaaatgttactggaaaaccatatgttaaaaattattatgaacgattaggtaataaa 
tataatctcaaacatactgctttttatgacgagaaacctcttcaaagtgagatggcaaaa 
ggctataagtttaaaaataatactttttcattccttaaacctaatatattagatcaatat 

55 tatggagctggtaatttatatatgacgccacatgatatgggcaaattaatttatacgtta 
caacaaaataaaatct ttaatgcacgtcaaactcgacctattttacatgaatttggaact 
caagaatatccagaagaatatagatatggtttttacataactccgtatttaaatagagtc 
aacggggtattctttggtcaaatttttactgtttactttaatgatcggtatattgtcatt 
t t agggacgaa tgtaagtaatacacctggattagt gag taatgaagacaaaat gaga cac 
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attttctataatattcttgaccagaaaaagccttataatacagcaggtgttaaagttgag 
taa 

Sequence 1056 

5 MKLNKFKIVFFIIVLLTLVVSIGILGVEWTRHLELKKQTLSQESGNTNYIEKRDKTVEKP 
KKI KTKYDKKDPTSKS I NKYLEKTQFNGTVAVFDNGKVKMNKGYGYQDI EKGKKNTANTM 
YLIGSAQKFTTGLMLKQLEVENKVNLQDSVTKYIPWFKTNKEITIKDLMLHKSGLYKYEA 
STNIKNLEQAVRAIQARGIDDTVYHKHQYNDANYLVLAKVIENVTGKPYVKNYYERLGNK 
YNLKHTAFYDEKPLQSEMAKGYKFKNNTFSFLKPNILDQYYGAGNLYMTPHDMGKLIYTL 
10 QQNKIFNARQTRPILHEFGTQEYPEEYRYGFYITPYLNRVNGVFFGQIFTVYFNDRYIVI 
LGTNVSNTPGLVSNEDKMRHI FYNILDQKKPYNTAGVKVE* 

Sequence 1057 

Con t i gj)5 5 3_pos_l 1 5 4 9_0 , 

15 putative peptide of unknown function 

gtgttagaacgggaaacacatttacgcgtcatgcctttatttgaagaaaattat tatatg 
tatgtgcccaaatcacatccactagctatgactgtacatcccccgctatctcaatttaca 
aatcaatcactatactgtctcgaaccaatgacaagctcaataaaaagtaaattgattgaa 
aagactaaggcacaagtacgaatgatttcagatatgaaactcgctcaacatattttgagt 

20 cataataagggatttattatttctagtcaaaattctttactatatgatcacgtaaattgg 
actaaaatccctttaaatcatacagaattaaaacgaatgctatgtgtagttatgcgaaaa 
gataacaagaaaaacgacattaatatagcatgga 

Sequence 1058 

25 . VLERETHLRVMPLFEENYYMYVPKSHPLAMTVHPPLSQFTNQSLYCLEPMTSSIKSKLIE 
KTKAQVRMISDMKLAQHILSHNKGFIISSQNSLLYDHVNWTKIPLNHTELKRMLCVVMRK 
DNKKNDINIAWX 

Sequence 1059 
30 Contig_0553_pos_10990_9044, 

is similar to (with p-value O.Oe+00) 

>sp:sp|P34 956|QOXl_BACSU QUINOL OXIDASE POLYPEPTIDE I (EC 1. 
9.3.-} (QUINOL OXIDASE AA3-600, SUBUNIT QOXB) (OXIDASE AA ( 3 ) 
SUBUNIT 1). >pir :pir I B38129 I B38129 quinol oxidase aa3-600 c 

35 hain I - Bacillus subtilis >gp: gp I M8654 8 I BACQOXA_2 Bacillus 
subtilis AA3-600 quinol oxidase (QOXA, QOXB, QOXC, QOXD) gen 
es, complete cds. NID: gl43395. >gp : gp | X73124 |BSGENR_39 B.su 
btilis genomic region (325 to 333). NID: g413923. >gp:gp|Z99 
123 | BSUB0020_111 Bacillus subtilis complete genome (section 

40 20 of 21): from 3798401 to 4010550. NID: g2636240. 

atgattatctcagcacaaattgctgcgccattcttagtcatcggccttatagcagttata 
tcttatttcaaattatggaaatatctatataaagaatggttcacatccgtagaccataaa 
aaaatcggtatcatgtatttaatttctgccgtattaatgttcgttcgtggtggtatcgat 
gcgttaatgttacgtactcaattaacaattccagataacaaattcttggaagcaaaccac 

45 tataatgaagtatttactacgcacggcgtaattatgattatatttatggctatgccattt 
atctttggtttatggaatgttgttattccattacaacttggtgcacgcgatgttgccttc 
cctgtaatgaataacgttagtttctggctattcttcgctggtatgattttattcaaccta 
tcatttattgtaggtggatcaccagctgctggttggactaactacgcaccacttgctggt 
gaattcagtccaggtccaggtgttaactattatttaattgcaattcaaatatctggtatc 

50 ggatcgttaatgactggtatcaacttctttgttacgattctaagatgtaaaactccaaca 
atgaagtttatgcaaatgccaatgttcagtgtaacaacattcattacaacattaatcgtt 
atattagcattcccagtgttcactgtagcacttgctttaatgactgctgatagaattttt 
ggaactcagttcttcactgtagcaaatggcggtatgccaatgctttgggcaaacttcttc 
tgggtatgggggcaccctgaagtttatatcgttattttgccagcattcggtatgtactca 

55 gaaatcatccctactt ttgcccgtaaacgtttattcggtcatcaaagtatgatttgggca 
actgcaggtatcgcattcttaagtttcttagtttgggttcaccatttcttcactatgggt 
aatggtgcgttaattaactcattcttctctatctcaacaatgttaatcggtgttccaacc 
ggagttaaactatttaactggttgctcacattatacaaaggtagaattacatttgagtca 
cctatgctattctcattagcattcatccctaacttcttattaggaggggttactggtgta 
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atgcttgcaatggcatcagctgactatcaatatcacaacacttatttcttagtagctcac 
ttccactatacattggttactggtgtagtatttgcctgcttagctggtttaatcttctgg 
tatccaaaaatgatgggctacaagttaaatgaaacattaaacaaatggtgcttctggttc 
ttcatgatcggatttaacgtttgtttcttaccacaattcattctaggtttagatggtatg 
5 ccacgtcgtctatacacttacatgccttctgatggttggtggttactaaacttcatctca 
actatcggtgcagtattgatggcaattggattcttattcctagttgcaagtatcgtttat 
agtcatatcaaagctccacgtgaagctactggagataactgggatggacttggtcgtact 
ttagaatggtctacagcatcagctattccacctaaatacaactttgctatcactcctgat 
tggaatgactacgatacattcgttgatatgaaagaacatggtcgtcattatttagacaac 
10 cataactacaaagatattcatatgccaaacaatactccagtaggattctggatgggtata 
tttatgactattggtggtttcttcttaatcttcgaatctattgttccagcacttatctgt 
ttagcaggtatcttcattactatgatttggagaagtttccaaattgatcatggttaccac 
atccctgcttcagaagttgcagaaactgaagctcgtttaagagaagctcgaattaaagaa 
agggaggctgtaagtcatgagtcatga 

15 

Sequence 1060 

MI I SAQI AAPFLVIGLIAVIS YFKLWKYLYKEWFTSVDHKKIGIMYLI SAVLMFVRGGI D 
ALMLRTQLTI PDNKFLEANH YNEVFTTHGVIMII FMAMPFI FGLWNVVI PLQLGARDVAF 
PVMNNVSFWLFFAGMILFNLSFIVGGSPAAGWTNYAPLAGEFSPGPGVNYYLIAIQISGI 

20 GSLMTGINFFVTILRCKTPTMKFMQMPMFSVTTFITTLIVILAFPVFTVALALMTADRIF 
GTQFFTVANGGMPMLWANFFWVWGHPEVYIVILPAFGMYSEIIPTFARKRLFGHQSMIWA 
TAGIAFLSFLVWVHHFFTMGNGALINSFFSISTMLIGVPTGVKLFNWLLTLYKGRITFES 
PMLFSLAFIPNFLLGGVTGVMLAMASADYQYHNTYFLVAHFHYTLVTGVVFACLAGLIFW 
YPKMMGYKLNETLNKWCFWFFMIGFNVCFLPQFILGLDGMPRRLYTYMPSDGWWLLNFIS 

25 TIGAVLMAIGFLFLVASIVYSHIKAPREATGDNWDGLGRTLEWSTASAIPPKYNFAITPD 
WNDYDTFVDMKEHGRHYLDNHNYKDIHMPNMTPVGFWMGI FMTIGGFFLIFESIVPALIC 
LAGI FI TMIWRS FQI DHG Y H I PAS EVAETEARLREARI KEREAVS H ES * 

Sequence 1061 
30 Contig_0553_pos_8859_8449, 

is similar to (with p-value 4.0e-41) 

>sp:sp|P34 958 |QOX3_BACSU QUINOL OXIDASE POLYPEPTIDE III (EC 
1.9.3.-) (QUINOL OXIDASE AA3-600, SUBUNIT QOXC) . >pir:pir|C3 
8129IC38129 quinol oxidase aa3-600 chain III - Bacillus subt 

35 ilis >gp:gp|M8 654 8|BACQOXA_3 Bacillus subtilis AA3-600 quino 
1 oxidase (QOXA, QOXB, QOXC, QOXD) genes, complete cds . NID: 
gl43395. >gp : gp I X7 3124 | BSGENR_4 0 B. subtilis genomic region 
(325 to 333). NID: g413923. >gp: gp | Z 99 123 I BSUB0020_1 10 Bacil 
lus subtilis complete genome (section 20 of 21): from 379840 

40 1 to 4010550. NID: g2636240. 

atgacttttgcattattaattagttcttatacttgtggtattgcaatttattacatgcga 
caagaaaaacaaaacttaatgatgttttggatgattatcacagttatcctaggtcttgta 
ttcgtaggtttcgaaatttacgaattcgcacactatgcttctgaaggtgttaacccaact 
attggctccttctggtctagtttctttatactactaggtacgcacggtgcacacgtatca 

45 ttaggtattgtttgggttatttgtttgttaattcaaatcggcactcgtggtttggattca 
tacaatgctcctaaattat ttatagtaagtttatactggcacttcttagatgttgtttgg 
gtcttcatctttactgccgtatatatgatagggatggtgtatagcggatga 

Sequence 1062 

50 MTFALLISSYTCGIAIYYMRQEKQNLMMFWMIITVILGLVFVGFEIYEFAHYASEGVNPT 
IGSFWSSFFILLGTHGAHVSLGIVWVICLLIQIGTRGLDSYNAPKLFIVSLYWHFLDVVW 
VFIFTAVYMIGMVYSG* 

Sequence 1063 
55 Contig_0553_pos_4 902_3790, 

is similar to (with p-value 0.0e+00) 

>gp: gpl U71377 | SEU71377_1 Staphylococcus epidermidis autolysi 
n AtlE and putative transcriptional regulator AtlR genes, co 
mplete cds. NID: g2267238. 
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atgaacaaatttataaaatattttttaatattattatcttttggtctcctcgttgttcca 
attatttttgctactcaattatatcaaagttcagaatcggcatttgagtcatctcaaaac 
actaaagattctcaacgaaagtctactttaagagattcaaaagttgatcctgaaaaacaa 
cctatatcaattttattcttaggtatagacgataatgaaggtagagaaaaaaacgggcaa 
5 agtgtagaacattctaggtcagatgctatgatattatctacttttaatcagaaaaagcat 
caaataagaatgcttagcatacctagagatactatcagttatatacctaaagttggctat 
tacgataaaataacacatgcgcatgcatatggtggacctcttgctgctatggactcagtt 
gaagcaacaatgaatgtaccggtagattattatgtgcgtattaatatgaaagcctttgtt 
gaagcagttgatgaattaggtggtatatattatgacgtaccatataacttaaatgaacct 

10 aacagtgatgatactggtagaattaaaataaaaaaaggataccaaaagctaaacggcgac 
caagcattagctgtagctcgaactagacaccatgattcagaccttaaacgtggtcaaaga 
caaatggaacttattaaaatattgttccaaaaagctcaaaatttaaaatctatagataaa 
cttgacaatgttattagtattgtagggaaaaatgctaaacataatttaactcaaaaagaa 
attaagtctctagccaaaatgtatcttggtggtagtactgaaattaaaacatcacaactt 

15 aaaggtaaggatgactacttaaatgatatatactattaccacccaagcgtaaaaagtatt 
atggaatattcaaatcttttacgtaatgatttagatttatctaaaataacaaacaaaaac 
gatttcttagatcaaagagtcattaaacgatatggttcactcgtacccttaacagaatta 
gatgaagacttattgcgtaagaaccaaaaggaatcgactgatagtcatgaattccttcaa 
attggcttaagacgtatgtatcatgtgaaataa 

20 

Sequence 1064 

MNKFIKYFLILLSFGLLVVPIIFATQLYQ5SESAFESSQNTKDSQRKSTLRDSECVDPEKQ 
PISILFLGIDDNEGREKNGQSVEHSRSDAMILSTFNQKKHQIRMLSIPRDTISYIPKVGY 
YDKITHAHAYGGPLAAMDSVEATMNVPVDYYVRINMKAFVEAVDELGGI YYDVPYNLNEP 
25 NSDDTGRIKIKKGYQKLNGDQALAVARTRHHDSDLKRGQRQMELIKILFQKAQNLKSIDK 
LDNVISIVGKNAKHNLTQKEIKSLAKMYLGGSTEIKTSQLKGKDDYLNDI YYYHPSVKSI 
MEYSNLLRNDLDLSKITNKNDFLDQRVIKRYGSLVPLTELDEDLLRKNQKESTDSHEFLQ 
IGLRRMYHVK* 

30 Sequence 1065 

Coritig_0553_pos_0_24 76, 

putative peptide of unknown function 

atgaatgcctatcaaattgaagaacttttttcacaagaaaatcttcaaaatgcagcacgt 
tcaggccgtccaattcaatttcttgtaggttttgatgttgaagatagccatcataaccct 

35 gaaactcttttaccagtaaatttatatgtaaaacctgagttaaaacatacaattgagtta 
tatcacgataatgaaaaacaagatagaaaggaattttcagtatcgaaacgagcgggccat 
ggtgttttccaagtaatgagtggaacgcttcataacactgtaggatcaggaat attacct 
tatcaacaagagatacgtatcaaacttactagtaatgaaccaattaaagatagtgaatgg 
tctattacaggatatcctaacacgcttacattacaaaacgctgtgggtagaacaaataat 

40 gctactgaaaaaaacttagctcttgttggtcatattgatccaggaaattatttcatcact 
gttaagtttggtgataaagtagaacaatttgaaattagatcaaaaccaactccaccaaga 
atcattacaactgctaatgaattacgtggaaatcctaaccacaagcctgaaataagagta 
acagat at accaaa t gat actactgctaaaatcaaacttgtgatgggcggaaccgatggt 
gatcatgatccagaaataaatccatatactgtccctgaaaactacacagtagttgcagaa 

45 gcataccatgataatgatccaagtaaaaatggggtcttaacattccgttcatcagactac 
cttaaagatctaccattaagcggtgaattaaaggcaattgtttattacaatcaatatgta 
caatcaaactttagtaatagcgttccgtttagtagcgatacaacaccacctacaattaat 
gaaccagcaggactagttcataagtattacaggggagatcatgtagaaattactcttcca 
gtcactgataatactggcggttcaggtttaagagatgtaaacgtcaatttacctcaaggt 

50 tggacaaaaacctttacaatcaatcctaataataatactgagggtacgcttaagttaatt 
ggtaatatacctagtaatgaagcatataatacgacatatcatttcaatattactgcaacc 
gat aattctggaaatacaacaaatccagctaaaacctt tattttaaatgttggtaagttg 
gctgatgatttaaatccagtcggattatctagagatcaactacaattagtgacagaccct 
tcttcattatctaattccgaacgagaagaggtaaaaagaaaaataagtgaagcaaatgct 

55 aatataagatcatatttattacaaaataacccaatactcgctggagtaaacggcgatgtt 
acattttattatagagatggttctgtagatgttattgatgctgaaaatgtaatcacatat 
gagcccgaaagaaaatccattttcagtgaaaatggtaatacaaataaaaaagaagcagta 
atcactattgctagaggacaaaactataccattggtccaaacttaagaaaatatttctca 
ttaagtaatggttcggatttacctaatagagatttcacctctatatcagctattggatct 
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ttaccttcatcgagtgaaattagtcgactcaatgttggaaattataactatagagttaat 
gctaaaaatgcttatcataagactcaacaagaacttaatttaaaacttaaaatagtagag 
gttaatgcacctactggtaataatcgtgtatatagagttagtacttataatttaactaat 
gatgaaatcaataaaatcaaacaagcatttaaagcagctaattctggacttaatttaaac 
5 gataacgatatcactgtttcgaataactttgaccatagaaatgttagtagtgtgacagta 
actatacgtaagggcgatttgataaaagagttttcatcaaatctcaataatatgaatttc 
ttacgttgggttaatataagggatgattataccatttcgtggacttctagtaagattcaa 
ggtagaaatacagatggtggattagaatggtcaccagatcataaatcacttatttataaa 
ta tgatgcaacattaggtagacaaataaat act aatgacgtgttaactt tact tcaagca 

10 acagctaaaaactcaaatttacgttcaaatatcaatagtaatgaaaaacagttagcagaa 
cgagggtctaatgggtattctaaatctataattagagatgatggcgagaaatcttattta 
cttaactcaaatcctattcaagtattagacttagtagaaccagataatggttacggtgga 
cgtcaagtcagtcattctaacgttatatataatgaaaaaaattcttctatcgtaaatggt 
caagttccagaagctaatggggcatccgcttttaatattgataaagttgttaaagctaat 

15 gcggcaaataatggta 

Sequence 1066 

MNAYQIEELFSQENLQNAARSGRPIQFLVGFDVEDSHHNPETLLPVNLYVKPELKHTIEL 
YHDNEKQDRKEFSVSKRAGHGVFQVMSGTLHNTVGSGILPYQQEIRIKLTSNEPIKDSEW 

20 SITGYPNTLTLQNAVGRTNNATEKNLALVGHIDPGNYFITVKFGDKVEQFEIRSKPTPPR 
IITTANELRGNPNHKPEIRVTDIPNDTTAKIKLVMGGTDGDHDPEINPYTVPENYTVVAE 
AYHDNDPSKNGVLTFRSSDYLKDLPLSGELKAIVYYNQYVQSNFSNSVPFSSDTTPPTIN 
EPAGLVHKYYRGDHVEITLPVTDNTGGSGLRDVNVNLPQGWTKTFTINPNNNTEGTLKLI 
GNIPSNEAYNTTYHFNITATDNSGNTTNPAKTFILNVGKLADDLNPVGLSRDQLQLVTDP 

25 SSLSNSEREEVKRKISEANANIRSYLLQNNPILAGVNGDVTFYYRDGSVDVIDAENVITY 
EPERKSIFSENGNTNKKEAVITIARGQNYTIGPNLRKYFSLSNGSDLPNRDFTSISAIGS 
LPSSSEISRLNVGNYNYRVNAKNAYHKTQQELNLKLKIVEVNAPTGNNRVYRVSTYNLTN 
DEINKIKQAFKAANSGLNLNDNDITVSNNFDHRNVSSVTVTIRKGDLIKEFSSNLNNMNF 
LRWVNIRDDYTISWTSSKIQGRNTDGGLEWSPDHKSLIYKYDATLGRQINTNDVLTLLQA 

30 TAKNSNLRSNINSNEKQLAERGSNGYSKSIIRDDGEKSYLLNSNPIQVLDLVEPDNGYGG 
RQVS HSNVI YNEKNSS I VNGQVPEANGASAFN I DKVVKANAANNGX 

Sequence 1067 
Contig_0554_pos_1606__34 77, 

35 is similar to (with p-value 0.0e+00) 

>sp:sp| P17 922 |SYFB_BACSU PHENYLALANYL-TRNA SYNTHETASE BETA C 
HAIN (EC 6.1.1.20) {PHENYLALANINE— TRNA LIGASE BETA CHAIN) 
(PHERS). >pir:pir IS11731I YFBSB phenylalanine — tRNA ligase (E 
C 6.1.1.20) beta chain - Bacillus subtilis >gp: gp I X53057 | BSP 

40 HEST_2 B. subtilis pheS and pheT genes for phenylalanyl-tRNA 
synthetase alpha and beta subunits. NID: g40052. 
atggtaggtactgcgtatgaagtcgcagctttatatcaaactaaaatgaataaacctcag 
ttaacaagcaatgaaagtcaagaatctgctaaagatgaattaacaatagaagttaaaaat 
gaagataaagcaccttactatagtgcacgtgttgttcatgacgtgactattggaccttct 

45 ccagtatggatgcagttccgattaattaaagcgggaatacgtccaattaataatgtggta 
gatatttccaattatgtacttttagaatatggccaacctctacacatgtttgatcaagaa 
caaattggttcgcaatctatagaagttagacaagctaaaaaagatgagacaatgagaact 
ttagatggtgaagaacgtcgattgttagatactgacattgtcattacaaatggcaaagac 
cctattgcattaggaggtgttatgggaggagatttctctgaagtcactgaacaaacacga 

50 catgttgtagtagaaggggctatctt tgatcctgtatctattcgacatacatcacgccgt 
ttaaatttaagaagtgaatcttcgagtcgatttgaaaaggggattgcaactgaatttgtc 
gatgaagctgtagacagagcttgttatttacttgaaagatatgcttcaggaacagtttta 
aaagaccgagtttcgcatggagatttaggatcatttgtgaccccaatagaaat tact get 
gacaaagttaaccgtacaattggttttaatttaactgatgaagaaatcattgatattttt 

55 gagcaattaggatttgacactgaaaataaaaatggtgaaattatcgtgaatgttccctca 
agacgtaaagatatttctattaaagaagacttaatagaagaagtagcacgtatatatgga 
tacgatgacataccatcaacgctacctgtatttaaagatgttacaagtggagaactaaca 
gatcgacagtttaaaacgcgtactgttaaagaaacacttgagggcgctgggctagaccaa 
gctattacttattcattggtatcaaaaaatcatgctaccgattttgcactacaaaatcgt 
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cctacaattgaactacttatgcctatgagtgaagcacataccacattacgtcaaagttta 
ttaccgcatttaattgatgcagtatcatacaatgttgctcgtaaaaatacaaatgttaag 
ttatatgaaattggacgtgtcttctttggtaacggtgaaggtgagttaccagatgaagta 
gaatacttgagtggtatattaactggagattttgttaataacacttggcaaggtaagaaa 
5 gagtcagttgatttctatttaactaagggtattgttgaacgtattgctgaaaagcttaat 
cttcaattcgattttagagctggtcaaatagatgggttacatccaggaagaacagcaatt 
gtgtcacttaatggtaaagatattggcttcataggtgagctacaccctacgttagctgca 
aacaatgatttaaagcgtacgtatgtatttgaacttaattatgatgcaatgatggaagtt 
tctgtgggatatattaattatgagcctatacctagatttccaggtgtaacacgtgatatt 
10 gca tt agaagt t aat ca tgaag tt act tcatctgaattgttatccat tat teat gaga at 
ggtgaagatattttaaatgatacactcgtatttgatgtatacgagggtgaacatttagaa 
aaagggaaaaaatctattgcaattagacttagttatctagatacagaaaacacacttacc 
gatgaacgtgtaaatgttgtgcatgataaaattttagaagcacttaaaaagcatggtgca 
attattagataa 

15 

Sequence 1068 

MVGTAYEVAALYQTKMNKPQLTSNESQESAKDELTIEVKNEDKAPYYSARVVHDVTIGPS 
PVWMQFRLIKAGI RP I NNVVDI SN YVLLEYGQPLHMFDQEQIGSQSI EVRQAKKDETMRT 
LDGEERRLLDTDIVITNGKDPIALGGVMGGDFSEVTEQTRHVVVEGAIFDPVSIRHTSRR 

20 LNLRSESSSRFEKGIATEFVDEAVDRACYLLERYASGTVLKDRVSHGDLGSFVTPIEITA 
DKVNRTIGFNLTDEEI I DI FEQLGFDTENKNGEI I VNVPSRRKDISIKEDLIEEVARI YG 
YDDIPSTLPVFKDVTSGELTDRQFKTRTVKETLEGAGLDQAITYSLVSKNHATDFALQNR 
PTIELLMPMSEAHTTLRQSLLPHLIDAVSYNVARKNTNVKLYEIGRVFFGNGEGELPDEV 
EYLSGILTGDFVNNTWQGKKESVDFYLTKGIVERIAEKLNLQFDFRAGQIDGLHPGRTAI 

25 VSLNGKDIGFIGELHPTLAANNDLKRTYVFELNYDAMMEVSVGYINYEPIPRFPGVTRDI 
ALEVNHEVTSSELLSIIHENGEDILNDTLVFDVYEGEHLEKGKKSIAIRLSYLDTENTLT 
DERVNVVHDKILEALKKHGAIIR* 

Sequence 1069 
30 Contig_0554_pos_4 982_5503, 

putative peptide of unknown function 

atgctcattgatatagttgttcttcttattatttgttactttatagtgatagggtttcgt 
agaggtatttggttatcgatattgcactttgcttcttcaattgtatctttatatattgcg 
tcacaacattatcaatctattgcgcaacgtttagttgtctttgtgccatttccgaaaacg 

35 gtggcgtttgacatggtctatactattccttatgatcatttgcaatacagat I tgaaaaa 
gtgatagcatttattataatatttggtatgtgtaagcttattttgtatctagttgttgtt 
acatttgataatataataacgtataaaaagatacatttagtaagtcggatatcgagtgtc 
gttttgagtatcatagcggtttttatatatttacaaattggactttatttattatcgcta 
tatccgcattcatttatacagtaccaattatctcaatcgctattaagtcgagttgtgatt 

40 gaacaaattccttatttatcacaatttattttaaatttataa 

Sequence 1070 

MLIDIVVLLIICYFIVIGFRRGIWLSILHFASSIVSLYIASQHYQSIAQRLVVFVPFPKT 
VAFDMVYTIPYDHLQYRFEKVIAFIIIFGMCKLILYLVVVTFDNIITYKKIHLVSRISSV 
45 VLSIIAVFIYLQIGLYLLSLYPHSFIQYQLSQSLLSRWIEQIPYLSQFILNL* 

Sequence 1071 

C on t i g_0 5 5 4 _po s_4 4 5 6_3 5 3 0 , 

is similar to (with p-value 2.0e-34) 

50 >sp:sp|O07874 |RNH2_STRPN RIBONUCLEASE HII (EC 3.1.26.4) (RNA 
A SE HII). >gp:gp|U93576ISPU93576_l Streptococcus pneumoniae r 
ibonuclease HII (rnhB) gene, complete cds. NID: g2209338. 
atgggaaatgtcgtatacaaactcacgtcaaaagaaattcaatcattgatggctcaaact 
acttttgagacgacgaagttacctcaaggtatgaaagctcgtacgagatatcaaaatact 

55 gttatcaatatctatagttctggcaaagtaatgtttcaaggtaagaatgctgaacaactt 
gcgagtcaattgctaccaaa taaacaatcaacaactggcaaacatacatcatcaaataca 
actagtattcaatataatcgttttcattgtattggaagcgatgaagcaggcagtggcgac 
tattttggtccattgactgtatgtgcagcttatgtgagccaatcacatatcaaaatctta 
aaagaacttggtgtagatgattcaaaaaaactaagcgatactaaaatcgtcgatcttgca 
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gaacagctcattacctttatcccgcattctttattaacattagataatgttaagtataac 
gaacgacaaagtctaggatggtctcaagttaaaatgaaagctgtcttacataatgaagct 
atcaaaaatgtgcttcaaaaaattgagcaagatcaactggattatattgttattgatcaa 
tttgcaaagcgagaagtttatcaacattatgcattatcagcattaccttttcctgacaaa 
5 acaaaatttgaaacaaaaggtgaatctaaatcactagcaatcgcggcagcaagcattatt 
tctcgttatgcatttgttaaacacatggaccacatctctaaaaaactccatatggaaata 
ccaaaaggagcaagtaacaaagtagatttaattgccgctaaagtcattcaaaaatatgat 
attcaacaacttgatactatttcaaaaaaacattttaaaaacagagataaagcaattcat 
cttatgaatcaaaaatacaataaataa 

10 

Sequence 1072 

MGNWYKLTSKEIQSLMAQTTFETTKLPQGMKARTRYQNTVINIYSSGKVMFQGKNAEQL 
ASQLLPNKQSTTGKHTSSNTTSIQYNRFHCIGSDEAGSGDYFGPLTVCAAYVSQSHIKIL 
KELGVDDSKKLSDTKIVDLAEQLITFIPHSLLTLDNVKYNERQSLGWSQVKMKAVLHNEA 
15 IKNVLQKIEQDQLDYIVIDQFAKREVYQHYALSALPFPDKTKFETKGESKSLAIAAASII 
SRYAFVKHMDHISKKLHMEIPKGASNKVDLIAAKVIQKYDIQQLDTISKKHFKNRDKAIH 
LMNQKYNK* 

Sequence 1073 
20 Contig_0557_pos_329_84 4, 

is similar to (with p-value 2.0e-86) 

>pir:pirlD43258|D43258 galactose-6-phosphate isomerase subun 
it LacB - Streptococcus mutans 

atgaaaattgcaataggttgcgatcatattgttactgatacaaaaatggaagtttcacaa 
25 cacttaaaatcacagggacatgaagtgatagatgttggaacttatgatttcacacgtaca 
cattatccgatttatggaaaaaaggtaggagaaaaagttgcgagtggtgaagcagattta 
ggtgtatgtatttgtggtactggtgtaggaattagtaatgctgcaaacaaagtaccaggt 
gttagaactgctttagttagagatatgacatcagcgctttattctaaagaagagttaaac 
gccaatgttgtaagttttggcggtaaagtagcaggtgaattatttattttcgacatcgtt 
30 gatgcattcattgaggcagagtacaaacctactgaagaaaataaaaaattaattgctaaa 
atcaatcatttagaagcacataacaatgaccaagctgatccacatttcttcgacgagttc 
ttagaaaaatggaataaaggtgaatatcacgattaa 

Sequence 1074 

35 MKIAIGCDHIVTDTKMEVSQHLKSQGHEVIDVGTYDFTRTHYPI YGKKVGEKVASGEADL 
GVCICGTGVGISNAANKVPGVRTALVRDMTSALYSKEELNANVVSFGGKVAGELFIFDIV 
DAFIEAEYKPTEENKKLIAKINHLEAHNNDQADPHFFDEFLEKWNKGEYHD* 

Sequence 1075 
40 Cont ig_0 5 5 7_pos_l 7 9 3_2 770, 

is similar to (with p-value 0.0e+00) 

>sp:sp|Plll00|LACD_STAAU TAGATOSE 1, 6-DI PHOSPHATE ALDOLASE ( 
EC 4.1.-.-). >pir:pir ! S04359 | S04359 lacD protein - Staphyloc 
occus aureus >gp: gp I X14827 | SALACCD_2 Staphylococcus aureus 1 

45 acC and lacD genes. NID: g4 6604. 

atgacaaaatcacaacaaaaagtgtcatcaattgagaaattaagtaatcaagaaggtatt 
atttcagctttagcatttgatcaacgtggtgcattaaaaagaatgatggcagaacatcaa 
tctgaaacaccaacagttgaacaaatagaacaattaaaagtacttgtttctgaagaatta 
actcaatatgcgtcttcaattttattagatccagaatatggtttaccagcatcagatgct 

50 cgaaataatgactgcggactattacttgcatacgaaaaaactggatatgatgtgaatgcg 
aaaggtcgtttgccagattgct tggtagaatggtctgcgaaacgtttgaaagagcaaggg 
gccaatgcagttaaatttttactttattatgatgtagatgacacagaagaaattaacata 
caaaagaaagcatatattgaacgaattggttcagaatgtgttgccgaagatattcctttc 
ttcttggaagttttaacatatgacgacaatattcctgacaataaaagtgcagaattcgct 

55 aaagttaagccacgtaaagttaatgaagcaatgaagttattctctgaagatcgttttaat 
gtggatgtacttaaagttgaagtacctgtgaatatgaattttgtggaaggattttcagaa 
ggagaagttgtttatactaaagaagaagctgcacaacatttccgtgatcaagatgcagct 
actcacttaccatatatttatttaagtgcaggtgtatcagcagaattgttccaagataca 
ttaaaatttgcgcatgattctggtgcgcaattcaatggtgttttatgtggacgtgccaca 
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tggtcaggagcagttaaggtatacattgaagaaggagagcaagctgccagagaatggttg 
cgtacggtaggatttaagaatattgatgatttgaatacagtattgaaaacaacagctaca 
tcatggaaaaacaaataa 

5 Sequence 1076 

MTKSQQKVSS I EKLSNQEGI I S ALAFDQRGALKRMMAE HQS ET PTVEQI EQLKVLVSEEL 
TQYASSILLDPEYGLPASDARNNDCGLLIAYEKTGYDVNAKGRLPDCLVEWSAKRLKEQG 
ANAVKFLLYYDVDDTEEINIQKKAYIERIGSECVAEDIPFFLEVLTYDDNIPDNKSAEFA 
KVKPRKVNEAMKLFSEDRFNVDVLKVEVPVNMNFVEGFSEGEWYTKEEAAQHFRDQDAA 
10 THLPYIYLSAGVSAELFQDTLKFAHDSGAQFNGVLCGRATWSGAVKVYI EEGEQAAREWL 
RTVGFKNIDDLNTVLKTTATSWKNK* 

Sequence 1077 

Contig_0557_pos_27 90_3104, 

15 is similar to (with p-value 2.0e-29) 

>sp:sp| P02909I PTLA_STAAU PTS SYSTEM, LACTOSE-SPECIFIC IIA CO 
MPONENT (EIIA-LAC) (LACTOSE- PERMEASE IIA COMPONENT) (PHOSPH 
OTRANS FERASE ENZYME II,. A COMPONENT) (EC 2.7.1.69) (EIII-LAC 
). >gp:gp| J0347 9 |STALACS_1 S. aureus enzyme Ill-lac (lacF) , e 

20 nzyme II-lac (lacE) , and phospho-beta-galactosidase (lacG) g 
enes, complete cds. NID: gl53036. 

atgaatagagatgaggtacaattactcggatttgaaattgttgcctatgctggggatgca 
cgttcaaaattattagaagctttaaatgctgctaaagatagtgaatttgataaagcagaa 
caacttgtagaggaagcgaatgaatgtattgctaatgcacataaagcacaaaccaatctt 
25 ctagctcaagaggctaaaggcgaggatatcgcatatagtatcactatgattcatggtcaa 
gaccatttaatgacaacattacttttaaaagatttaatgaagcatttaattgaattatac 
aaaaaagggagctga 

Sequence 1078 

30 MNRDEVQLLGFEIVAYAGDARSKLLEALNAAKDSEFDKAEQLVEEANECIANAHKAQTNL 
LAQEAKGEDIAYSITMIHGQDHLMTTLLLKDLMKHLIELYKKGS* 

Sequence 1079 
Contig_0557_pos_3110_4 858, 

35 is similar to (with p-value 0.0e+00) 

>sp : sp I PI 1 1 62 | PTLB_STAAU PTS SYSTEM, LACTOSE-SPECIFIC IIBC C 
OMPONENT (EIIBC-LAC) (LACTOSE- PERMEASE IIBC COMPONENT) (PHO 
SPHOTRANS FERASE ENZYME II, BC COMPONENT) (EC 2.7.1.69) (EII- 
LAC) . >pir :pir I B28474 | B28474 phosphotransferase system enzym 

40 e II (EC 2.7.1.69), lactose-specific, factor II - Staphyloco 
ecus aureus >gp : gp I J03479 | STALACS_2 S. aureus enzyme Ill-lac 
(lacF), enzyme II-lac (lacE), and phospho-beta-galactosidase 

(lacG) genes, complete cds. NID: gl53036. 
atgaataaattaatagcatggatagaaaaaggaaagccattctttgaaaaaatatcacga 

45 aatatttatttaagagcgattcgtgatggatttattgctgctattccaattatcttattc 
tcaagtatatttattttaattacctatgtaccaaatgtgtttggttttacttggagtaaa 
actatggaaggtatattgatgaaaccctataactatacaatgggaatagttggtttgctt 
gtagcaggaaccacagctaaatctttaactgattcttacaatcgaaaactagataaaacg 
aa tcagattaactttatttcgacaatgatggcagctatttgtggatttttattcttagct 

50 gctgatcctgttaaagatggtggattttcaagtgcatttatgggaacaaaaggtttattg 
acagcctttatttctgcgtttattaccgtgattgtttataatttctttgtcaaaagaaat 
attaccattaaaatgcctaaagaagtaccaccaaatatatctcaagtatttaaagatatt 
ttccctttatcagccgtaattttaattttgtatgctttagacttactttctagagcaata 
gtccacacgaatgtagcaaatgcagtattaaaagtatttgagccactatttactgcggca 

55 gatggttggattggggtaacactcatattcggtgcgtttgcgttcttctggtttgtaggt 
attcatggaccttctattgttgaaccagcgattgcagcaattacttatgcgaaccttgaa 
acaaatttacacttaatacaagctggagaacatgctgataaagtaattacaccgggtaca 
cagatgttcgtagcaactatgggaggaaccggtgcaacattagttgttccatttatgttt 
atgtggttaacaaaatcaaaaagaaataaagcgataggtagagcatcagtcgtacctaca 
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ttctttggtgtcaatgaacccatactttttggtgcaccactagtactaaatccggtattc 
tttataccttttatttttgcacctatagtaaatatatggatttttaaattttttgttgat 
gttttaaatatgaatagttttagtatctttttaccttggactactcctggtccactcggt 
attgttatggggactggatttgcattttggtcatttgtgttagcaatattacttattgtt 
5 gttgatgtgattatttactatccattcttaaaagtatacgatgaacaagtgcttgaagaa 
gaattaggaaataaagaagcaaataatgaattaaaagaaaaagtatcagcaaactttgat 
acgaaaaaagccgatgctattttagcaactgcaggggcaagtgaagcggatactgatgat 
acatcttcagttgatgaaacaacttctacatcctctacagatactattagtgaacaaaca 
aatgttttagttttatgtgcaggtggaggtacaagtggtttactagctaatgctttaaat 
10 aaagctgctgaagagtatgaagtaccagtaaaagcagcagcaggtggttatggtgcacat 
atggatattatgaaagattatcaattaattatcttagcaccacaagttgcttcgaatttt 
gaagatattaaacaagatactgatcgcttaggaattaaattagccaaaactgaaggcgct 
caatatatcaagttaacaagagacggtgaggcggctttagaatttgtaaaacaacaattt 
aacaattaa 

15 

Sequence 1080 

MNKLIAWIEKGKPFFEKISRNI YLRAIRDGFIAAIPIILFSSIFILITYVPNVFGFTWSK 
TMEGILMKPYNYTMGIVGLLVAGTTAKSLTDSYNRKLDKTNQINFISTMMAAICGFLFLA 
ADPVKDGGFSSAFMGTKGLLTAFISAFITVIVYNFFVKRNITIKMPKEVPPNISQVFKDI 

20 FPLSAVILILYALDLLSRAIVHTNVANAVLKVFEPLFTAADGWIGVTLIFGAFAFFWFVG 
IHGPSI VEPAIAAITYANLETNLHLIQAGEHADKVITPGTQMFVATMGGTGATLVVPFMF 
MWLTKSKRNKAIGRASVVPTFFGVNEPILFGAPLVLNPVFFIPFI FAPIVNIWIFKFFVD 
VLNMNSFSIFLPWTTPGPLGIVMGTGFAFWSFVLAILLIWDVIIYYPFLKVYDEQVLEE 
ELGNKEANNELKEKVSANFDTKKADAILATAGASEADTDDTSSVDETTSTSSTDTISEQT 

25 NVLVLCAGGGTSGLLANALNKAAEEYEVPVKAAAGGYGAHMDIMKDYQLIILAPQVASNF 
EDIKQDTDRLGIKLAKTEGAQYIKLTRDGEAALEFVKQQFNN* 

Sequence 1081 

Contig_0557_pos_4 07 4_628 6, 

30 is similar to (with p-value 0.0e+00) 

>sp:sp|P11175|LACG_STAAU 6-PHOSPHO-BETA-GALACTOSIDASE (EC 3. 
2.1.85) ( BETA- D-PHOSPHOGALACTOS IDE GALACTOHYDROLASE) (PGALAS 
E) ( P- BETA-GAL) (PBG). >pir : pir I A27233 ) A27233 beta-galactosi 
dase (EC 3.2.1.23) - Staphylococcus aureus >gp: gp| J0347 9 I STA 

35 LACS_3 S. aureus enzyme Ill-lac (lacF), enzyme II-lac (lacE) , 
and phospho-beta-galactosidase (lacG) genes, complete cds . 
NID: gl53036. 

atgactaagaaattacctgatgactttatttttggtggagcaaccgctgcttatcaagca 
gaaggagctactcagactgatggtaaagggcgtgtcgcttgggacacgtatt tagaggag 

40 aattattggtacacagctgaaccagcaagtgatttttataacagatatcctgttgacttg 
gaattaagtgaacgctttggtgtaaatggtatacgtatctcaattgcttggtctcgtatt 
tttcctaaaggttacggtgaagtgaatcaaaaaggtgtcgagtattatcataatcttttc 
aaagaatgtcataaacgtcatgttgaaccttttgtaacattacatcactttgacacacca 
gaggtacttcacaaagatggagacttcttaaatcgtaaaacaatagactattttgtagat 

45 tatgctgaattttgttttaaagaatttccagaagttaagtattggacaacattcaatgaa 
attgggccgattggtgatggtcaatatttagttggtaaattccctccaggtatcaaatat 
gactttgaaaaagtattccaatctcatcataatatgatggttgcacacgcacgtgctgtt 
aaactttttaaagatgaaaattataagggagaaataggtgttgtccatgcattacctaca 
aaatatccatatgatccatctaatcctgaagatgtgagagcagccgaacttgaagacatt 

50 attcataataaatttattttagatgcaacataccttggtaagtactcacgtgaaacgatg 
gaaggagtacaacacatcttatctgtgaatggtggtcaattagagatttctgatgaagac 
tacaaaattttagatgaagctaaggatttaaacgatttcttaggtattaattattatatg 
agtgactggatgcgtggttttgaaggcgaatctgaaataacacataatgccactggtgar 
aaaggtggatctaagtatcaacttaaaggtgtaggacaacgtgaatttgatgttgatgtt 

55 cctagaaccgattgggattggatgatttatccacaaggtttatatgaccaaattatgcgt 
gtagtaaaagattatccgaattatcataagatttatattactgaaaatggattaggatat 
aaagatgtattcgacgaaaaagaaaaaacagtacatgacgatgcacgaattgactatatt 
aaacagcatctaagtgtgatagcagatgcgattgcagatggtgccaatgttaagggatac 
ttcttatggtctcttatggatgtattttcatggtcaaatggttatgaaaaaagatacggt 
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ttattctacgttgattttgaaacacaagaaagattccctaagaaaagtgcatattggtac 
aaagaacttgcagaaagtaaagaaattaaataa 

Sequence 1082 

5 MTKKLPDDFIFGGATAAYQAEGATQTDGKGRVAWDTYLEENYWYTAEPASDFYNRYPVDL 
ELSERFGVNGIRISIAWSRIFPKGYGEVNQKGVEYYHNLFKECHKRHVEPFVTLHHFDTP 
EVLHKDGDFLNRKTIDYFVDYAEFCFKEFPEVKYWTTFNEIGPIGDGQYLVGKFPPGIKY 
DFEKVFQSHHNMMVAHARAVKLFKDENYKGEIGVVHALPTKYPYDPSNPEDVRAAELEDI 
IHNKFILDATYLGKYSRETMEGVQHILSVNGGQLEISDEDYKILDEAKDLNDFLGINYYM 
10 SDWMRGFEGESEITHNATGDKGGSKYQLKGVGQREFDVDVPRTDWDWMIYPQGLYDQIMR 
VVKDYPNYHKIYITENGLGYKDVFDEKEKTVHDDARIDYIKQHLSVIADAIADGANVKGY 
FLWSLMDVFSWSNGYEKRYGLFYVDFETQERFPKKSAYWYKELAESKEIK+ 

Sequence 1083 
15 Contig_0557_pos_6797_7 423, 

putative peptide of unknown function 

gtgagtagtgaaggaaaagtatattttgataaaaaattaagtgaagatgcagcaaaccct 
attgtcaaagtagaatttaaagataataaaaatggaaattttaaagaaaatgcttattgg 
attaaagaagttctatcacaactaaaaagtcaatttggaattcaacaatttaattttgta 

20 ggacattcaatggggaacatgtcatttgctttttacatgaaaaattatggggacgatcga 
catttgccacaacttaaaaaggaagttaatatagcgggagtttataacgggattttgaat 
atgaatgagaacgtgaatgaaattatcgttgataaacaggggaaaccaagtagaatgaat 
gccgcatatcggcaattgttatcactgcataagatttattgtggtaaggaaatagaagtt 
ttaaatatctacggagatttagaagatggctcacattcagatggacgtgtgtcaaatagc 

25 tcttctcaatcgcttcaatatttactaagaggtagcactaagtcttatcaagaaatgaaa 
tttaaaggtgcaaaggcacaacatagtcaattacatgagaataaagatgttgcaaatgaa 
atcatacaattcttatgggaaacttaa 

Sequence 1084 

30 VSSEGKVYFDKKLSEDAANPIVKVEFKDNKNGNFKENAYWIKEVLSQLKSQFGIQQFNFV 
GHSMGNMSFAFYMKNYGDDRHLPQLKKEVNIAGVYNGILNMNENVNEI TVDKQGKPSRMN 
AAYRQLLSLHKIYCGKEIEVLNI YGDLEDGSHSDGRVSNSSSQSLQYLLRGSTKSYQEMK 
FKGAKAQHSQLHENKDVANEIIQFLWET* 

35 Sequence 1085 

Contig_0557_pos_854 5_7553, 

is similar to (with p-value 2.0e-42) 

>sp:sp| P39606I YWCHBACSU HYPOTHETICAL 36.6 KD PROTEIN IN QOX 
D-VPR INTERGENIC REGION. >pir : pir[ S3 96991 S3 9699 hypothetical 

40 protein - Bacillus subtilis >gp : gp 1 X73124 | BSGENR_4 5 B.subti 
lis genomic region (325 to 333). NID: g413923. >gp : gp | Z99123 
I BSUB0020_105 Bacillus subtilis complete genome (section 20 
of 21): from 3798401 to 4010550. NID: g2636240. 
atgaagaggagatgtgtgtt cat gagattaagtt tact tgattatgtcccgttgttcgaa 

45 gggcgtaccccaaatgacgccctaaagcatagtattaaattagcccaacacgctgagaaa 
cttgggtacttacgatactgggttgcagaacatcatcaagtttattctgtcgtttctagt 
gcacctgaaataataatgatgtcgattttagaacacacacaacacatcagagttggtagt 
ggaggtgtgatgttaccacattatagtccttataaagtagctgagcaatttaaaattatg 
gaagcaagacacccccaacgtatcgatatggctatcggacgttcgccaagctttaaaaat 

50 gttaatgcagcactaaatgaaaacaaaaatgaaaaattaccattcaatactcagattact 
gatttgcttaaatacttcaataacgatacaactcaagaccatcgttttaaatcattatta 
gctacacctatggttacttcatttcctcaactatatattttaggtatgagtaatagaagc 
gcaaaattagctgctcagcgcggactaccttttgttattgcacgaatgggacaatctgag 
acagaccttcatgaagctataagcacttatagaaaatattttaaagcttatcatggtgaa 

55 attaataatgcgaaaccatatgttattttagcaacttt tgtggtaacagcttctaa t tta 
tctagagttaaacaattgctacatacgcttcaactttggttgatgcgtattaactattta 
aatcaacctaagagttatccatcgattgaaacagcacagaacaagcat tatagtcaacga 
gaattagaaaagcttgaaaagatgaaatcgaaaatcatatacgaatgccaaatgatgttg 
cggaacaacttaccttacttcatcaacaatttaaagtggatgaaatcatcatcttacctc 
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atgtatttggtgaagacgctagaatggaattaa 
Sequence 1086 

MKRRCVFMRLSLLDYVPLFEGRTPNDALKHSIKLAQHAEKLGYLRYWVAEHHQVYSVVSS 
5 APEI IMMSILEHTQHIRVGSGGVMLPHYSPYKVAEQFKIMEARHPQRIDMAIGRSPSFKN 
VNAALNENKNEKLPFNTQITDLLKYFNNDTTQDHRFKSLLATPMVTSFPQLYILGMSNRS 
AKLAAQRGLPFVIARMGQSETDLHEAISTYRKYFKAYHGEINNAKPYVILAT FVVTASNL 
SRVKQLLHTLQLWLMRINYLNQPKSYPSIETAQNKHYSQRELEKLEKMKSKI I YECQMML 
RNNLPYFINNLKWMKSSSYLMYLVKTLEWN* 

10 

Sequence 1087 

Con t i g_0 5 5 7_po s_2 8 5 0_2 530, 

is similar to (with p-value 9.0e-35) 

>sp:sp| P11100|LACD_STAAU TAGATOSE 1 , 6-DI PHOSPHATE ALDOLASE ( 
15 EC 4.1.-.-). >pir :pir I SO 4 359 I SO 4 359 lacD protein - Staphyloc 
occus aureus >gp:gp 1X14 827 | SALACCD_2 Staphylococcus aureus 1 
acC and lacD genes. NID: g46604. 

gtgcatccccagcataggcaacaatttcaaatccgagtaattgtacctcatctctattca 
tttgaatatcctcccttacattatttgtttttccatgatgtagctgttgttttcaatact 
20 gtattcaaatcatcaatattcttaaatcctaccgtacgcaaccattctctggcagcttgc 
tctccttcttcaatgtataccttaactgctcctgaccatgtggcacgtccacataaaaca 
ccattgaattgcgcaccagaatcatgcgcaaattttaatgtatcttggaacaattctgct 
gatacacctgcacttaaat aa 

25 Sequence 1088 

VHPQHRQQFQIRVIVPHLYSFEYPPLHYLFFHDVAYVFNTVFKSSIFLNPTVRNHSLAAC 
SPSSMYTLTAPDHVARPHKTPLNCAPESCAN FNVSWNNSADTPALK* 

Sequence 1089 
30 Con t i g_0 5 5 8_pos_l 113 0_1 0378, 

is similar to (with p-value 0.0e+00) 

>pir :pir | S19723 I S19723 dihydrolipoamide dehydrogenase (EC 1. 
8.1.4) - Staphylococcus aureus >gp : gp I X58 4 34 | SAPDHDNA_3 S.au 
reus pdhB, pdhC and pdhD genes for pyruvate decarboxylase, d 
35 ihydrolipoamide acetyltransf erase and dihydrolipoamide dehyd 
rogenase. NID: g48871. 

atgacacaacctgttaaaaaaggtatgaaagaaaaaggtatcgaaatcgttactgaagca 
atggcaaaatctgcagaagaaactgaaaatggtgtcaaagtaacttatgaggcaaaaggt 
gaggaacaaactatcgaagctgattatgtattagttacagttggccgtcgccctaatact 

40 gatgaattaggattagaagaacttggtctgaaatttgctgatcgtggattactagaagtg 
gacaaacaaagtcgtacttctattgaaaatatctttgcgatt'ggagatattgtacctgga 
ttaccattagctcacaaagctagttatgaaggtaaagttgctgctgaagcgatagatggt 
caagccgcagaggtagactatattggtatgccagcagtttgctttacagaaccagaatta 
gcacaagttggttatactgaagctcaagcaaaagaagaaggtttatcaattaaagcttct 

45 aaattcccttatgcagctaatggacgagctttatcattagatgatacaaatggttttgtt 
aagttaattacacttaaagaagatgatacgcttattggagcacaagttgtaggtactggc 
gcatctgatattatctctgaattaggtttagctattgagtcaggtatgaatgctgaagat 
atcgcattaactgtacatgcacacccaactttaggtgaaatgacaatggaagctgctgaa 
aaagcaattggttatccaattcatactatgtaa 

50 

Sequence 1090 

MTQPVKKGMKEKGIEIVTEAMAKSAEETENGVKVTYEAKGEEQTIEADYVLVTVGRRPNT 
DELGLEELGLKFADRGLLEVDKQSRTSI ENIFAIGDI VPGLPLAHKASYEGKVAAEAIDG 
QAAE VDY I GMPAVCFTE PELAQVGYTEAQAKEEGLS I KAS KFP YAANGRALS LDDTNGFV 
55 KLITLKEDDTLIGAQVVGTGASDIISELGLAIESGMNAEDIALTVHAHPTLGEMTMEAAE 
KAIGYPIHTM* 

Sequence 1091 
Contig_0558_pos_9659_9120, 
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putative peptide of unknown function 

atggatattggatacaaattacgtaatttaagaagaataaaaaatttgacacaagaggaa 
ttagcagagcgaactgatttatcaaaaggatatatatcacaaattgaaagtaatcatgct 
tcacctagtatggaaacatttttaaatttaatagaagtacttggtacttctgcaagtgac 
5 ttttttaaagaaccgtcagat gaga aggt act ttataagaagaaggaacagaccatt tat 
gatgagtatgataaaggttatatcttgaactggcttgtagcgaattctaatgaatttgac 
atggaaccattaatcctaactttacgaccaaatgcctcatataaaaactttaaaccatct 
gaatcagatacttttatctattgtttaaatggtgaagtatcacttcaattaggaaatcaa 
gtatataaagcttgtaaagaagatgtactttattttaaagcgaaagataaacatcgctta 
10 tataacgaaacagataaagaagtgaaggttttaatcgttgccacagcttcatatttatag 



Sequence 1092 

MDIG YKLRNLRRI KNLTQEELAERTDLSKGYI SQIESNHAS PSMET FLNLI EVLGTSASD 
15 FFKEPSDEKVLYKKKEQTIYDEYDKGYILNWLVANSNEFDMEPLILTLRPNASYKNFKPS 
ESDTFIYCLNGEVSLQLGNQVYKACKEDVLYFKAKDKHRLYNETDKEVKVLIVATASYL* 



Sequence 1093 
20 Contig_0558_pos_9108_8014, 

is similar to (with p-value 5.0e-68) 

>gp: gp| AF077856 | AF077856_1 Act inobacillus act inomycetemcomit 
ans putative polyamine transport operon, complete sequence. 
NID: g3341853. 

25 atgaatccattgctttcttttaaagatgtcagtaagggctttgaagatgtacaaatacta 
aatgaaattaatattgatattgaaccaggctatttttatacactattaggtccctcaggt 
tgtggaaaaacaacaattttaaaactcatagcaggatttgaatatcccgatagtggagat 
attatatataaagataaacctattggtaaaatgccaccgaataagcgtaaggtaaatact 
gtattccaagactatgcattgtttccacatttaaatgtattcgacaatat tgcatatggt 

30 ttaaaattaaaaaaattaagtaagtcagaaattaagcgtaaggttactgaagcacttcag 
ttggtgaaattaagtggttatgaacataggcaaatacaaggtatgagtggtggacaaaaa 
caacgtgtagccatagcacgggcaattgttaatgagcctgaaatattattattagatgag 
tctttatccgcattagatttaaaattacgaactgaaatgcaatatttattgagagaactt 
caatcccgtttaggtataacctttatatttgtaactcatgatcaagaagaggccttagca 

35 ttaagtgattatatttttgttatgaaagatggcaaaattcaacaatttggcacaccaata 
gatatatacgatgaaccagttaaccgatttgttgctgattttataggagagtccaacata 
gttcacggtacaatggttgaagattttgtcgttaatatttatggtcaaaattttgattgt 
gtagatatgggaataaaagaaaataaaaaagttgaagttgtaattagacccgaagacatt 
tcacttgtttcacaaaatgatgggctatttaaagccaaagttgattctatgctatttaga 

40 ggtgtacattatgaaatttgt tgtaaagatagaaaagggtatgaatgggtaatacaatca 
acaaaaaaagctaatgtaggtagtgaagtaggtctgtattttgaaccagaagcaatacac 
atcatggtaccaggtgaaactgaagaagaatttgataagcgaattgaaagttatgaggac 
tatcatcatgcataa 

45 Sequence 1094 

MNPLLSFKDVSKGFEDVQILNEINIDIEPGYFYTLLGPSGCGKTTILKLIAGFEYPDSGD 
IIYKDKPIGKMPPNKRKVNTVFQDYALFPHLNVFDNIAYGLKLKKLSKSEIKRKVTEALQ 
LVKLSGYEHRQIQGMSGGQKQRVAIARAIVNEPEILLLDESLSALDLKLRTEMQYLLREL 
QSRLGITFIFVTHDQEEALALSDYIFVMKDGKIQQFGTPIDI YDEPVNRFVADFIGESNI 

50 VHGTMVEDFVVNIYGQNFDCVDMGIKENKKVEWIRPEDISLVSQNDGLFKAKVDSMLFR 
GVHYEICCKDRKGYEWVTQSTKKANVGSEVGLYFEPEAIHIMVPGETEEEFDKRIESYED 
YHHA* 

Sequence 1095 
55 Co n t i g_0 5 5 8_pos_ 785 9_7218, 

is similar to (with p-value 1.0e-29) 

>sp:spl P45170IPOTB HAEIN SPERMIDINE/PUTRESCINE TRANSPORT SYS 
TEM PERMEASE PROTEIN POTB. >pir : pir I A64 1 18 I A64 1 18 spermidine 
/putrescine transport system permease protein (potB) homolog 
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- Haemophilus influenzae (strain Rd KW20) >gp : gp I U32813 | U32 
813_11 Haemophilus influenzae Rd section 128 of 163 of the c 
omplete genome. NID: gl574796. 

atgtttattgattcaatatggtatgccgctttaattactatgattaccttaataataagt 
5 tacccagctgcgtactttatttcttattcaagatttcaaaatatactgcttatgttgtta 
attatccctacttggattaatttacttcttaagacctatgcatttattggtttgttggga 
catgatggagttattaaccaagctctacatatatttcaaatacctaaattaaatttgttg 
tttacaagtggtgcatttttattggtggcgagttatatttatatcccatttatgattttg 
cctatatttaacagcatgaaagcaattcctaacaatattttgcaagcctctaatgatttg 
10 ggcgcgagtacatttactacgtttcgtaaagtaatcgttcctttaacaagagaaggtatt 
aaaacaggtgtgcaagtaacatttataccagctctttcactgtttttgattactaggttg 
attgccgggaacaaagtaatcaatgtaggtacagcaattgaagaacagttcttaactata 
caaaattatggattaggttccactatagcactttttctcattatttttatggccttttta 
ctcattattacaaaatcaaaatcatctaatgggaaggggtga 

15 

Sequence 1096 

MFIDSIWYAALITMITLIISYPAAYFISYSRFQNILLMLLIIPTWINLLLKTYAFIGLLG 
HDGVINQALHIFQI PKLNLLFTSGAFLLVASYIYIPFMILPIFNSMKAIPNNILQASNDL 
GASTFTTFRKVIVPLTREGIKTGVQVTFIPALSLFLITRLIAGNKVINVGTAIEEQFLTI 
20 QNYGLGSTIALFLI I FMAFLLI ITKSKSSNGKG* 

Sequence 1097 

Contig_0558_pos_1609_272, 

is similar to ( with " p-value 4.0e-63) 

25 >sp:sp|P37536| YAAO_BACSU HYPOTHETICAL 53.2 KD PROTEIN IN XPA 
C-ABRB INTERGENIC REGION . >gp : gp I D26185 | BAC180K_91 B. subtil 
is DNA, 180 kilobase region of replication origin. NID: g4 67 
326. >gp:gp| Z99104 | BSUB0001_27 Bacillus subtilis complete ge 
nome (section 1 of 21): from 1 to 213080. NID: g2632267. 

30 atgaaaagaccaataattcaaaaattaaatcacttgatagagaaaaaagctatctctatg 
catgttcctggacataaaaacatgacaatcggctacttaaataggcttgatttagcaatg 
gatatgacagaaattactggattagatgatatgcattatcctgaaggaattattttagaa 
agcatggagaattttaggaaacataaaaactatgatgctttcttattagttaacggaacg 
acttcaggtatattatcggttatccaagcgttttcgacaagaaaaggtaaatatttaatt 

35 agtagaaatgttcataaat cagtatttcacggtttagacataacacaacaacaagcgaca 
ataactaagacagatgtcagtaagaaaacgaatcaatatgtaaatccaaagataaatcaa 
gataaaaatcaatattataaacttgccatctgtacataccctaattattacggtgaaact 
tttgatatttctcaatatatcaaacaattacatcacagagggataccgatattagtagat 
gaagcgcatggtgcacattttggtttatatggatttccagaatcctcaatgaattttaat 

40 gctgattacgttgtgcagtcatatcacaaaacactccctgcactaacaatgggatcagtt 
atatatatacataaagatgcaccattaagacaacaagtcatagattatttaacttatttc 
caaacgtcaagtccttcgtacctcattatgtctagtttagaattagcgaataaattttat 
aaagaatatgattctacattatttgaccaacgaagaaagatgttaattgatttattagta 
aatatgggatttacagttatagaaccagaggatcctttaaaattggttgtgagttttgag 

45 ggtgttgaaggttatgatgtgcaaaaatggtttgaggataaagaaatttatgtagaatta 
gctgatatgtatcaagtgttactcgttctccccctatggcatgaaggagataaatttcct 
tttaagttgttgattgaaaaaattagagaaattaacgtgccaaaaaaatgtacgcgcgac 
ataaaacctcttaattttatgacgggttttagcgaatacaaaactgttcattttcaaaat 
acgaaagaagtgtctattaaaagggcagaaggtaaagttttagcacaacatatcgttcca 

50 taccctccaggtataccggtgatgtttaaaggagaagtcgtgacgtctcatatgataga.c 
ttattaaataaatatgataaacaaaatattaaagttgaaggtttaaatcataaaaaaata 
ttagttaaggatgaataa 

Sequence 1098 

55 MKRPI IQKLNHLIEKKAISMHVPGHKNMTIGYLNRLDLAMDMTEITGLDDMHYPEGIILE 
SMENFRKHKNYDAFLLVNGTTSGILSVIQAFSTRKGKYLISRNVHKSVFHGLDITQQQAT 
ITKTDVSKKTNQYVNPKINQDKNQYYKLAICTYPNYYGETFDISQYIKQLHHRGIPILVD 
EAHGAHFGLYGFPESSMNFNADYWQSYHKTLPALTMGSVI YIHKDAPLRQQVIDYLTYF 
QTSSPSYLIMSSLELANKFYKEYDSTLFDQRRKMLIDLLVNMGFTVIEPEDPLKLVVSFE 
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GVEGYDVQKWFEDKEIYVELADMYQVLLVLPLWHEGDKFPFKLLIEKIREINVPKKCTRD 
IKPLNFMTGFSEYKTVHFQNTKEVSIKRAEGKVLAQHIVPYPPGIPVMFKGEVVTSHMID 
LLNKYDKQNIKVEGLNHKKILVKDE+ 

5 Sequence 1099 

Contig_0559_pos_867_1328, 

putative peptide of unknown function 

gtgaaaagtggcaaagcacgagcacatacaaatattgcgttgattaagtattgggggaaa 
gctgatgaaacttacattattcctatgaataatagtttatcagttaccttagatagattt 

10 tatactgaaacaaaagtgacatttgaccctgattttactgaagattgccttattttaaat 
ggt a a tgaagtgaatgccaaagagaaagaaaagattcaaaact at a tgaat at agt gaga 
gatttggctggaaatcgtttgcatgcgcgaattgaaagtgaaaattatgtgccaactgaa 
caatcaaaagaaaaacaagctaatgaacaagcaaaagcgcaaaatctttttgctcgctgg 
agaaaagaagagcgtttttgctatacgacttactatcactttttctctattttcaaatgt 

15 tggatgcatagacgctcctttgactgtataagaagcaaataa 

Sequence 1100 

VKSGKARAHTNIALIKYWGKADETYIIPMNNSLSVTLDRFYTETKVTFDPDFTEDCLILN 
GNEVNAKEKEKIQNYMNIVRDLAGNRLHARIESENYVPTEQSKEKQANEQAKAQNLFARW 
20 RKEERFCYTTYYHFFSIFKCWMHRRSFDCIRSK+ 

Sequence 1101 

Contig_0559_pos_2914_3561, 

is similar to (with p-value 5.0e-52) 

25 >sp: sp | P42423 | YXDLBACSU HYPOTHETICAL ABC TRANSPORTER ATP-BI 
NDING PROTEIN IN IDH 3' REGION. >gp : gp | D14 399 | BACIOLO_13 Baci 
llus subtilis 15 kb chromosome segment contains the ioi oper 
on. NID: g709980. >gp : gp | Z99124 | BSUB0021_68 Bacillus subtili 
s complete genome (section 21 of 21) : from 3999281 to 421481 

30 4. NID: g2636442. >gp : gp I D4 5912 I D45912_2 Bacillus subtilis g 
enome sequence between the iol and hut operon, partial and c 
omplete cds . NID: gl408482. 

atgggtccttctggatcaggtaaaacgactttactcaatgtgttaagttcaatagatact 
atttcagaaggaactgtggaagttgaaggcaaagaaattaataaactgagccacaaagaa 

35 gtggcaaattttcgaaaacaacatctcggttttatttttcaagattatagcgttttaccc 
acattaacagtaaaagaaaatattatgctaccactctcagtacaaaaattccataaatat 
gaaatggaacaaaattataaagaagtggctgaggcattaggtatttataacctgggaaat 
aaatatccaagtgaaatttctggcggtcagcaacaacgtacggcggcagcccgggcattc 
gtccataaaccaacgattattttcgcagatgaacctactggcgcattagattctaaaagt 

40 gctcaagatttgttacaccgtctagaagatatgaataaacaatttaattcaaccattatg 
atggtgacacatgatccttcagccgctagttacgctgagagagtcattatgttgaaagac 
ggtgatatacactcagaaatctaccagggtaacgattcaaaacaaacattttaccaagaa 
attatgaaacttcaaaccgcattaggtggtgtcagtcatgacatttaa 

45 Sequence 1102 

MGPSGSGKTTLLNVLSSIDTISEGTVEVEGKEINKLSHKEVANFRKQHLGFIFQDYSVLP 
TLTVKENIMLPLSVQKFHKYEMEQNYKEVAEALGIYNLGNKYPSEISGGQQQRTAAARAF 
VHKPTIIFADEPTGALDSKSAQDLLHRLEDMNKQFNSTIMMVTHDPSAASYAERVIMLKD 
GDIHSEI YQGNDS KQT FYQE I MKLQTALGG VS HDI* 

50 

Sequence 1103 

Contig_0559_pos_3857_0, 

putative peptide of unknown function 

atgctcaatatagaacaactagtattttttattgtaacaggaattttaggcactttaatt 
55 ggtatttttggttcaaaacttttacttgttatcgcttctaaattaatgaagttaaacaca 
catatctctattggctttgaaccccaagctatacttattactatcgtaatgttagctgtc 
gcttttttattgataatgatacaaaattacattttcttaaaaaaacacagcattttagct 
ttgatgaaagacaattataccccggaagctacccaaaaacggataactacgtttgaagca 
atcggcggcattttaggaattataatgatagtatttggatattatatgtctactgaaatg 
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tttggtgtttttaaagccttaacaactgctttgattacaccttttagcatacttttctta 
actattgttggtgctttcttattctttagaagttctgtatcacttatttttaaaacacta 
aaacat att aaacatggtcgtgtaaatatcacagatgttgtctttaca teat eta tea tg 
cacagaatgaagaaaaatgcgatgtctctcacagttattgctatcatttcagctttcacg 
5 gttagtat tctttgcttcgcggcaattacacaatctaatactaatacaactttagaaatg 
acctctccagatgattttaatataagccagaataaaatagctgcgcaatttaaacataaa 
ctagatcaaggaaatttaaaatatcatcagcggacttatgaagtaatcaatccaaaaaca 
ttaagcgaccacgtcatgaagagtaaaaatggttctgatatgtctactaatacaacatca 
ctaatgatgaactcacatctcaaaggtcatgaagctaaaataacgaatatacaatcatc3 
10 acaggattaatagatattcatttaaatcataagattacagttaaaggaaaatctaaacaa 
tetattategttaaagaca 

Sequence 1104 

MLNIEQLVFFIVTGILGTLIGIFGSKLLLVIASKLMKLNTHISIGFEPQAILITIVMLAV 
15 AFLLIMIQNYIFLKKHSILALMKDNYTPEATQKRITTFEAIGGILGIIMIVFGYYMSTEM 
FGVFKALTTALITPFSILFLTIVGAFLFFRSSVSLIFKTLKHIKHGRVNITDVVFTSSIM 
HRMKKNAMSLTVIAIISAFTVSILCFAAITQSNTNTTLEMTSPDDFNISQNKIAAQFKHK 
LDQGNLKYHQRTYEVINPKTLSDHVMKSKNGSDMSTNTTSLMMNSHLKGHEAKITNIQSS 
TGLIDIHLNHKITVKGKSKQSIIVKDX 

20 

Sequence 1105 

Con t i g_0 5 5 9_po s_2 5 3 5_2 1 3 4 , 

putative peptide of unknown function 

atggagaaaaggagtattaatatgaaaaaagt atttatgatcataagtatacttaccata 
25 actgttactttaagtgcatgtggaggttctggaaaacaaaaagagccatctaaggaaagt 
caaaaatctgataaatatgattatgtttattatgaaatattaaatgatggagattctgaa 
acgccaaatgttgagattaaatataaagataaaaaaggtaaatcacatatagaaaaagct 
gatttagatcacgtgtatgaacatatactaggtgatggtaataaaaaaccatatattgta 
aaggatgggaagaaaattcatgtatatcgaccaccatatatgatttatggtgatgatgat 
30 gttgaaggcaaagccgt ttcgaaagatgaagttacgaagtaa 

Sequence 1106 

MEKRSINMKECVFMI ISILTITVTLSACGGSGKQKEPSKESQKSDKYDYVYYEILNDGDSE 
TPNVEIKYKDKKGKSHIEECADLDHVYEHILGDGNKKPYIVKDGKKIHVYRPPYMIYGDDD 
35 VEGKAVSKDEVTK* 

Sequence 1107 

Contig_0559_pos_2082_1474, 

putative peptide of unknown function 

40 atggtacttctcacgtcttctttaagtattgtcagtacatattctcatgcaacaacgtca 
ggaggaacgagtagttccagttcggcaagttctagttcaagtagcagtgcagcttctgca 
tctagaggttcaacttcttcaagtacaagtatgagtcgttctagtgcaataaatgcgtct 
cgcaatgcacaacaatctagtcagcgtgctgcccaacaagcaacaaaatcaagtcgtgta 
acagcaaccaaaaa t aa aggacaacaaagtgt at caagaca a aaagcacaatctcgttct 

45 ttgatgccgtctcaaagaccttatgattcaagtgcaccatactcatctcaatatattgct 
acaacttattataataattggttattctattatatttttgcacattcgtttttaaatcaa 
catgaaaagaaaaacagtgtagatgctcagtttaatatgttgaaacaacaaatgaagcct 
catgagaaactttatactgttactgtaaagactaaacaaggaaagcgtgtcgttgttgta 
cctaaaaaacaatatgacaaaattgaaaaaggaaaacacattaaagttaaaaatggtgtt 

50 gttcagtaa 

Sequence 1108 

MVLLTSSLSIVSTYSHATTSGGTSSSSSASSSSSSSAASASRGSTSSSTSMSRSSAINAS 
RNAQQSSQRAAQQATKSSRVTATKNKGQQSVSRQKAQSRSLMPSQRPYDSSAPYSSQYIA 
55 TTYYNNWLFYYIFAHSFLNQHEKKNSVDAQFNMLKQQMKPHEKLYTVTVKTKQGKRVVVV 
PKKQYDKI EKG KH I KVKNGVVQ* 

Sequence 1109 
Contig_0561_pos_316_1254 / 
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is similar to (with p-value 0.0e+00) 

>sp:sp| P45557 | PRMA_STAAU PROBABLE METHYLTRANSFERASE (EC 2.1. 
1--). >gp:gp|D30690|STANHS_5 Staphylococcus aureus genes for 
ORF37; HSP20; HSP70; HSP40; ORF35, complete cds . NID: g4873 
5 26. 

atgaattggatggaactctcaattgtagttaatcacgaagtagaatacgatgttacagaa 
attcttgaaagttatggctctaatggagttgtaattgaagattcaaatattttagaagaa 
caacctattgataagtttggagaaatttatgacttaaaccctgaagactatcctgaaaaa 
ggagttcgattaaaagcttactttaatgagttcacttataatgaaaacttaaaatccaac 

10 atcaattatgaaatattaagtcttcagcaaattgataaaacaatttatgattaccaggaa 
aaacttattgccgaagtagattgggaaaatgaatggaagaattatt ttcatccat ttaga 
gcttcaaaacaatttacgatagtaccaagttgggaatcatatgttaaagaaaatgataac 
gaattgtgcattgaattagatccaggtatggcttttggaacaggtgatcatccaacgaca 
agtatgtgtttaaaagcaattgaaacttttgtaaaaccaactgattcagttatcgacgtt 

15 ggaacagggtcaggcattttaagtattgctagtcatttacttggagttcaaagaataaag 
gcattagatatagatgaaatggctgtaaatgtggcaaaagaaaactttaagaaaaatcat 
tgtgatgatgcaattgaagcagttccaggtaatttattaaaaaatgaaaatgagaaattt 
aatatcgttattgcaaatattcttgctcatattattgaagaaatgattgaagatacttat 
aatactttaattgaagatggttattt tatcacatcaggtattattgaagaaaagtatcaa 

20 gatatagaatcacaaatgaagcgtattggtttcaaaattatttcagtagaacatgacaat 
ggctgggtttgtatagttggtcagaaagtgagtggataa 

Sequence 1110 

MNWMELSIVVNHEVEYDVTEILESYGSNGVVIEDSNILEEQPIDKFGEIYDLNPEDYPEK 
25 GVRLKAYFNEFTYNENLKSNINYEILSLQQIDKTI YDYQEKLIAEVDWENEWKNYFHPFR 
ASKQFTI VPSWESYVKENDNELCIELDPGMAFGTGDHPTTSMCLKAIETFVKPTDSVIDV 
GTGSGILSIASHLLGVQRIKALDI DEMAVNVAKENFKKNHCDDAIEAVPGNLLKNENEKF 
NIVIANILAHIIEEMIEDTYNTLIEDGYFITSGI IEEKYQDIESQMKRIGFKIISVEHDN 
GWVCIVGQKVSG* 

30 

Sequence 1111 

Contig_0561_pos_1271_2065, 

is similar to (with p-value 3.0e-37) 

>sp:sp| P54 4 61 I YQEU_BACSU HYPOTHETICAL 28.8 KD PROTEIN IN DNA 
35 J-RPSU INTEREGENIC REGION . >gp : gp | D84432 I BACJH642_115 Bacill 
us subtilis DNA, 283 Kb region containing skin element. NID: 
g2627063. >gp : gp I Z991 17 | BSUB001 4_24 Bacillus subtilis coropl 
ete genome (section 14 of 21): from 2599451 to 2812870- NID: 
g2634966. >gp: gp I D83717 | D83717_3 Bacillus subtilis DNA for 
40 DnaJ, YqeT, YqeU, YqeV, YqeW, YqeX, YqeY, complete and parti 
al cds. NID: gl890057. 

atgaatcaaagcgctgatgaaaatcagtgcttttttattgaaaacaaagaagactatcat 
catatcgtgaatgttatgcgctataaagaaggacaaaatattattgtcactttttcagat 
gaaaatgtattcaaatgtaaaattatttcaataaacgatcaatcgattgaaattaaatta 

45 gtagaaaagcaacaaattaacactgaactacctcagaacattacaatatgtagtggttta 
atcaaagcagacaaatatgaatggatgatacaaaaagcaactgaaatgggggcaaatgag 
tttatagctgtagctatggaacgttctgtggtcaagctcaatgattctaaagtagaaaag 
aaattatcgagatggcaaaaaattataaaggaagctgcagaacaaagttatcgtttaaca 
ataccaaatataaaatttaagtcgaatttaaaagaaatttatggtatgataagtcaatat 

50 gactatgttcttatagcatatgaagaacaagcaaagcacggtgaattaagtcaatttaag 
caaacaattaaacaatttaagacacaggatcgtgttttaatcatatttggaccaaatgaa 
gaaactaatgcggattctaaattcacaaaattttatcaaaaccaaatcgacaaactgaaa 
aatgcaaataacgctcaacttaataacgaaaatcaaagtaaagttaacaacatgcttgaa 
gacatcaatacaaaatttgatagtattaaagctaaactagaaaatatcttgaatggatca 

55 aattcaggaaactaa 

Sequence 1112 

MNQSADENQCFFIENKEDYHHIVNVMRYKEGQNIIVTFSDENVFKCKIISINDQSIEIKL 
VEKQQINTELPQNITICSGLIKADKYEWMIQKATEMGANEFIAVAMERSVVKLNDSKVEK 
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KLSRWQKIIKEAAEQSYRLTIPNIKFKSNLKEIYGMISQYDYVLIAYEEQAKHGELSQFK 
QTIKQEKTQDRVLIIFGPNEETNADSKFTKFYQNQIDKLKNANNAQLNNENQSKVNNMLE 
DINTKFDSIKAKXjENILNGSNSGN* 

5 Sequence 1113 

Cont ig_0 5 6 l_pos__7 8 5 3_8 4 94, 

is similar to (with p-value 4.0e-52) 

>gp: gp I Z99122 I BSUB0019_48 Bacillus subtilis complete genome 
(section 19 of 21): from 3597091 to 3809700. NID: g2636029. 

10 >gp: gp| U56901 | BSU56901_2 Bacillus subtilis putative transcri 
ptional regulator (yvhJ) , Ycr59c/YigZ homolog (yvhK), histid 
ine kinase (degS ), transcriptional regulator of degradation e 
nzyme (degU) , (degV) , ( comFA) , (comFB) , (comFC) , flagellar p 
rotein (yviB) , negative regulator of flagellin (flgM), flage 

15 liar protein (yviC), f lagellar-hook associated protein 1 (fl 
gK) , f lagellar-hook associated protein 3 (flgL), (yviE), tra 
nsmembrane protein (yviF) , (csrA), flagellin (hag), flagella 
r protein (yviH) , flagellar hook-associated protein 2 (fliD) 
, flagellar protein (fliS), flagellar protein (fliT) , sigma- 

20 54 modulator homolog (yvil), and (secA) genes, complete cds . 
NID: gl762326. 

atggataaatccataattactattaaacaagcacattcaattgaaaatgtgataagtaaa 
tcacgctttatagcatatattaagcctgtttcgactgaaaatgaagcaaaagcttttata 
gatgaaattaaaacaaaacataaagatgcaactcataattgttcagcctatactgtcgga 

25 ccagagatgaatattcaaaaggcaaacgacgatggcgaaccaagtggaacagctggcatc 
ccaatgcttgaaatactgaaaaaacaagagatacacaatgtttgtgtcgtcgtgacacgc 
tacttcggtggtatcaagttaggtgcaggcggtcttattagagcatatagcggcgccgtg 
cgtgatgtgatatatgatataggtagagtcgaactaagagaagctat t ccagtaaccgtt 
acgttagattatgatcagacaggtaaatttgaatatgaacttgcctctactacattctta 

30 ttaagagaacaattttataccgataaagtaagttatcaaattgacgtagtaaaaaatgaa 
tatgatgcttttatagactttttaaatcgaattacttctggaaattatgatttgaaacaa 
gaagaccttaaactattaccttttgatattgaaaccaattaa 

Sequence 1114 

35 MDKSIITIKQAHSIENVISKSRFIAYIKPVSTENEAKAFIDEIKTKHKDATHNCSAYTVG 

pemn iqkan ddgepsgtagi pmlei lkkqe I hnvcv vvtry fggi KLGAGGLI RAYSGAV 

RDVI YDIGRVELREAIPVTVTLDYDQTGKFEYELASTTFLLREQFYTDKVSYQIDVVKNE 
YDAFIDFLNRITSGNYDLKQEDLKLLPFDIETN* 

40 Sequence 1115 

Cont ig_0561_pos_9667_8 555, 

is similar to (with p-value 0.0e+00) 

>pir : pir | A55856 I A55856 11m protein - Staphylococcus aureus 
gtgaggtacaact tattcaatgaaggtgaactgatgtatacactat tactt atagctttt 

45 actatgatagtcagtttaataattacacccattattattgtaatatcaaaaaaattagat 
ttagtagatcgtcctaatttcagaaaagtacatacgaaacctatctcagtgatgggagga 
acggtcattttattttctttcttaatagggatttggctcggacaccctattgaacgtgag 
gttaaaccgcttatattaggtgcaattacaatgtatatggttggattgattgatgatat I 
tacgatctaagaccttatttaaagttagcaggtcaaattgttgcagctttaattgttacg 

50 ttttatggaattacaatagactttatttcattgccaattggtccaacgattcattttggc 
atattcagcattcctattacagtaatatggattgtagcaattaccaatgctattaatctt 
atcgacggacttgatggacttgcctcaggcgtctcagcattggcattaatgactattgga 
ttcatcgctattttacaagcgaacatatttattatcatgatttgctgtgtacttttaggg 
tctttacttggtttcttattctataactttcacccagcgaaaattttcctaggtgatagt 

55 ggtgcattaatgataggatttattatcggtttcttatccttactcggctttaagaatatc 
acatttattgcattattctttcctatagttatattagcggtgccatttattgatacatta 
tttgcaatgattcgtcgaatgaaaaaagggcaacatataatgcaagcggacaagtcacat 
ttacatcataaattacttgctttaggatatacgcatagacaaaccgttttacttatttat 
tcaatagcgattatgtttagtttatctagtgttatcctctatttatcccaaccgttgggt 
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gcacttatgatgttcattctcattgtctttacgattgagttgatcgttgaatttactgga 
ttaatagatgataattatcgaccaatattaaatttaattacaaaaaaaggaaatggtaag 
caacatcattatgatgagcatcaccgttcataa 

5 Sequence 1116 

VRYNLFNEGELMYTLLLIAFTMIVSLIITPIIIVISKKLDLVDRPNFRKVHTKPISVMGG 
TVILFSFLIGIWLGHPIEREVKPLILGAITMYMVGLIDDI YDLRPYLKLAGQIVAAL1VT 
FYGITIDFISLPIGPTIHFGIFSI PITVIWIVAITNAINLIDGLDGLASGVSALALMTIG 
FIAILQANIFIIMICCVLLGSLLGFLFYNFHPAKIFLGDSGALMIGFIIGFLSLLGFKNI 
10 TFIALFFPIVILAVPFIDTLFAMIRRMKKGQHIMQADKSHLHHKLLALGYTHRQTVLLIY 
SIAIMFSLSSVILYLSQPLGALMMFILIVFTIELIVEFTGLIDDNYRPILNLITKKGNGK 
QHHYDEHHRS* 

Sequence 1117 
15 Contig_0561_pos_7708_6842, 

is similar to (with p-value 1.0e-48) 

>sp:sp| P324 36I DEGV_BACSU DEGV PROTEIN . >pir : pir I S28596 I D301 9 
1 hypothetical protein U3 - Bacillus subtilis >gp: gp | Z18629 I 
BSCOMFG_l B. subtilis comFgene. NID: g39847. >gp: gp | Z99122 | B 

20 SUB0019_45 Bacillus subtilis complete genome {section 19 of 
21): from 3597091 to 3809700. NID: g2636029. >gp: gp | U56901 I B 
SU56901_5 Bacillus subtilis putative transcriptional regulat 
or (yvhJ) , Ycr59c/YigZ homolog (yvhK) , histidine kinase (deg 
S) , transcriptional regulator of degradation enzyme (degU) , ( 

25 degV) , (comFA) , (comFB) , (comFC) , flagellar protein (yviB) , 
negative regulator of flagellin (flgM), flagellar protein {y 
viC) , f lagellar-hook associated protein 1 (flgK), flagellar- 
hook associated protein 3 (flgL), (yviE), transmembrane prot 
ein (yviF), (csrA) , flagellin (hag), flagellar protein (yviH 

30 ) f flagellar hook-associated protein 2 (fliD), flagellar pro 
tein (fliS), flagellar protein (fliT), sigma-54 modulator ho 
molog (yvil), and (secA) genes, complete cds . NID: gl762326. 
atgaagattgcagttatgaccgattctacaagt t at ttaccacaacatataatagaacaa 
tataacataccagtcgcttcactaagtgtaactttcgatgatggagtgaatttcactgag 

35 agtgatgatttttctgtagatgatttttataaaaaaatggcttcatctaaaactatacca 
acaacaagccaacctgctattggcgattggattgaaaattttgagagattaagagaacaa 
ggatacactgatgtcatcgtgattaacttatcaagtggtataagcggaagctatccttca 
gcaacacaagctggtgaaatggttgaagatattcaagtacatacgtttgatagccgtctt 
gctgcgatgattgaaggtagctttgcaatttacgctgctcaattggtacaaaagggatat 

40 aaacctgatgatattattaatgaactaactgaaataagacaacatattggtgcatactta 
attgttgatgatttaaaaaatttacaaaaaagtggtcgtatcactggagctcaagcttgg 
gtaggtacattattgaaaatgaaacctgtcttgcgttttgaagaagatggtaaaatacat 
ccacacgaaaaagtacgtactaaaaaacgtgcgctaaaatctttagaaacaaacattttt 
aaagaaatagaaggcatggaagatgtgacagtatttgtaataaacggtgataaaactgaa 

45 gatggaaagtcatttcttcagcaattaaaggaagatcatcctaatgttcatattcagtat 
tgtgaatttggaccagtgatagcatcacatttaggatcaggcggtttaggattgggttac 
ttcccaagaagaatcgacattaattaa 

Sequence 1118 

50 MKIAVMTDSTS YLPQH I I EQYN I PVASLSVTFDDGVNFTESDDFSVDDFYKKMASSKTI P 
TTSQPAIGDWIENFERLREQGYTDVIVINLSSGISGSYPSATQAGEMVEDIQVHTFDSRL 
AAMIEGSFAI YAAQLVQKGYKPDDIINELTEIRQHIGAYLIVDDLKNLQKSGRITGAQAW 
VGTLLKMKPVLRFEEDGKIHPHEKVRTKKRALKSLETNIFKEIEGMEDVTVFVINGDKTE 
DGKSFLQQLKEDHPNVHIQYCEFGPVIASHLGSGGLGLGYFPRRIDIN* 

55 

Sequence 1119 

Contig_0561_pos_6305_54 36, 

is similar to (with p-value 4.0e-44) 

>sp:sp|P39145|CMFl_BACSU COMF OPERON PROTEIN 1. >pir:pir|S28 
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597|S28597 hypothetical protein Fl - Bacillus subtilis >gp:g 
p|E18 629|BSCOMFG_2 B . subtilis comF gene. NID: g39847. >gp:gp 
I Z99122 | BSUB0019_44 Bacillus subtilis complete genome (secti 
on 19 of 21): from 3597091 to 3809700. NID: g2636029. >gp:gp 
5 IU56901 |BSU56901_6 Bacillus subtilis putative transcriptiona 
1 regulator (yvhJ) , Ycr59c/YigZ homolog (yvhK) , histidine ki 
nase (degS) , transcriptional regulator of degradation enzyme 
(degU) , (degV) , (coraFA) , (comFB) , (comFC) , flagellar protein 
(yviB) , negative regulator of flagellin (flgM), flagellar p 

10 rotein (yviC) , f lagellar-hook associated protein 1 (flgK), f 
lagellar-hook associated protein 3 (flgL) , (yviE), transmemb 
rane protein (yviF) , (csrA), flagellin (hag), flagellar prot 
ein (yviH), flagellar hook-associated protein 2 (fliD), flag 
ellar protein (fliS), flagellar protein {fliT}, sigma-54 mod 

15 ulator homolog (yvil) , and (secA) genes, complete cds . NID: 
gl762326. 

atgggacataatattgctattgtatcacctcgtgtagacgttattattgagataagtcat 
cgaattaaagatgct tttatcgatgaacatatagatgtgctacatcaatctagtagacag 
caatataatggtcattttgttattgctactatccatcaattattgaggtttaaacagcat 

20 tttgatactgtatttgtcgatgaggtagatgcttttccgttgtctatggatccacaatfca 
tcaaatgcaatacaacttgcttcaaaatcgaatcattcacatattttcatgacggccaca 
ccaccgcgtcattttttaaaacaattccccccagaaaaaataattaagttaccagcccgt 
tttcaccgatccccccttcctattcctaagttcaaatatttcaaattaaaatcaacacga 
aaacaaaatttattacttaatatatttagatatcaaattaaccaacaacgttttactttg 

25 gtctttattaataatatagaaattatgaataaaatgtatcaacagtataaaatggacatc 
cctgatttgatttgcgttcacagtgaagatgatttacgatttgaaaaaattgaagcttta 
agacgaggacaacacaaaattgtattcactacaactattttagaaagaggatttacaatg 
acacacttagatgtcgttgtagttgatgctggaagttttcaacaagaggctttaattcaa 
attgctggtcgcgtaggacgtaaacagcagtctccaagtggcttagttttatttcttcat 

30 gaaggtgttacattatcgatgattttagctaaaagaaacattatttcaatgaatcgttta 
gcaattaaaaggggatggattgatgcgtaa 

Sequence 1120 

MGHNIAIVSPRVDVIIEISHRIKDAFIDEHIDVLHQSSRQQYNGHFVIATIHQLLRFKQH 
35 FDTVFVDEVDAFPLSMDPQLSNAIQLASKSNHSHIFMTATPPRHFLKQFPPEKIIKLPAR 
FHRSPLPIPKFKYFKLKSTRKQNLLLNIFRYQINQQRFTLVFINNIEIMNKMYQQYKMDI 
PDLICVHSEDDLRFEKIEALRRGQHKIVFTTTILERGFTMTHLDVVVVDAGSFQQEALIQ 
IAGRVGRKQQSPSGLVLFLHEGVTLSMILAKRNIISMNRLAIKRGWIDA* 

40 Sequence 1121 

Cbntig_0561_pos_5227_4 769, 

putative peptide of unknown function 

atggaacaattgttttgtgattatagttatgatggtatgatgaaagaaatcatacaccag 
tataagattaagcgagacttctatttggcagaagtattggcgagaaaattagttttacct 

45 caaacgcaatatgattatatagttcccattccttctccaattgaacgcgacattgaacgt 
acatttaatcctgtgaccactgtcttagataaaatgggcatctcatatcaagatgtatta 
ggtacacatatacgtcctaagcagtcaaagttaggaaagattgaacgttcaaaagcccct 
aatccattttatataaaagatgaagagataaatatcgaagggaaagtaatactactcata 
gatgatatttatacaacagggttaactattcatcacgcagggtgtaaattgtacgataaa 

50 aaagtcagaaaattcaaagtgtttgcgtttgcacgataa 

Sequence 1122 

MEQLFCDYSYDGMMKEI IHQYKIKRDFYLAEVLARKLVLPQTQYDYIVPIPSPIERDI ER 
TFNPVTTVLDKMGISYQDVLGTHIRPKQSKLGKIERSKAPNPFYIKDEEINIEGKVILLI 
55 DDI YTTGLTIHHAGCKLYDKKVRKFKVFAFAR* 

Sequence 1123 

Contig_0562_pos_5078_5830, 

putative peptide of unknown function 
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atgaatgccatgaaagataatactaataaattgcatcaagcgttaactaaaatacsacaa 
aaaatgcccggggatgggggagacacgcctcatcaagatatggctaaaccatataaacta 
acaacgcactatttatatggttcatcagattctacgtattttgatatgataaatcctatt 
ttaattggattttttgtctttttctttacgtttttaatttctggcattggcttattaaaa 
5 gagcgtacttctggcacattagaacgtttacttgcctctccaataaaaagaagtgaaatt 
atttttggttatgttttcggttatggtagttttagcgttatccaaacaatagttgtcgta 
ttatatgcaatttatattctgcatatagacttagtaggttcgatatggttcgtactatta 
acggcaatattaacagcgcttgtcgctgtgacattcggtatattattatctacctttgct 
tcctcagaattccaaatgattcaatttataccattagtcatagtgccacaagtactattt 
10 gcaggcattataccaattgaatcaatgaataaaggattacaatacttttcacatatcatg 
ccgttattctataccggccaaacgatgcaaaatattatgatcaagggttatggattcaac 
gatatttacatttatttaattgtgttattcgcatttttcattttcttattgattttaaat 
attataggcatgaaaagatatagaaaagtttag 

15 Sequence 1124 

MNAMKDNTNKLHQALTKIQQKMPGDGGDTPHQDMAKPYKLTTHYLYGSSDSTYFDMINPI 
LIGFFVFFFTFLISGIGLLKERTSGTLERLLASPIKRSEIIFGYVFGYGSFSVIQT'IVVV 
LYAIYILHIDLVGSIWFVLLTAILTALVAVTFGILLSTFASSEFQMIQFIPLVIVPQVLF 
AGII PIESMNKGLQYFSHIMPLFYTGQTMQNIMIKGYGFNDIYI YLIVLFAFFIFLLILN 

20 I IGMKRYRKV* 

Sequence 1125 

Contig_0562_pos_584 0_64 90, 

putative peptide of unknown function 

25 atgaaccaagatattaagtcattagttgaaaccattgtgcctcaacttgaatatttaagc 
gataaacaaagacgtgtcatagaaagtgctattgcattattcagtgaacaaggatt tgat 
aaaacgagtactaaagaaattgcgcagcgtgcaaatgtcgcagaaggaacggtatttaag 
cagtttaaaagtaaaagaatgttattatacgcaggattaattccaattttaagagatcat 
atcgcacctgtagctgttaaacaatttacagatgaattaaacgaagtaacccattttgat 

30 gcatttataaatttatttgtagaaaatagatctaaatttatttatgacaatagacgtatt 
cttaaagtcatcttaaatgaagctattactaatgaagattttcaaaatatattagttaat 
attttcacccataaattaacgagtaaattaaaagataaaattgaatggtttatcgataat 
ggtgacatgcgcaatgttaaacctgagttttttatacgtacggtcgtcgcacaaatttta 
aatttaaatatcccaataatagttaataatgactatactaagggtgaaaactatcagcag 

35 tttgcgttattcgtaaaagagggcttatataggatgtttaagcgagaatag 

Sequence 1126 

MNQDIKSLVETIVPQLEYLSDKQRRVIESAIALFSEQGFDKTSTKEIAQRANVAEGTVFK 
QFKSKRMLLYAGLIPILRDHIAPVAVKQFTDELNEVTHFDAFINLFVENRSKFIYDNRRI 
40 LKVILNEAITNEDFQNILVNIFTHKLTSKLKDKIEWFIDNGDMRNVKPEFFIRTVVAQIL 
NLNI PIIVNNDYTKGENYQQFALFVKEGLYRMFKRE* 

Sequence 1127 

Contig_0562__pos_7 987_8 625, 

45 putative peptide of unknown function 

atgaaattagaggaaattaatacaatagataaaaatgatttttctaaaaaaccaacttat 
ggggctgaaaaaaattattttaaaattgctgttatggattttcaaaaagagagtacaaag 
agactagacattgatatatatgatactggaactaagatagcaaaatattggagcgaagga 
tttttatcaatttatccgttgagagataatgaattaaatactaatgatattattaactct 

50 ttaaaagaaaatactctgttcaatgaattacctaatataacagaaaaagatcttaatcaa 
tctactaatgactttctaagtgataatagagaattttcatttgtaaaattaagagatcat 
attaatcagcgctttgaaacagacttcaaaaatgatgaactatttaaagatgacaaatta 
aatcatatagacgcagaatttaatttagaaccaaggattgtaggtaaatatttacagaaa 
aaaataaaaattaatgatgtaatcagtt tagatattaataatattagtaaagcaacaaaa 

55 ggaggcctaatgaaagtgtctaatgataaaagatacttaaaaatacgtctcgaaacaaat 
attaaagaagaccttcccgatattcaggaaggagat tga 

Sequence 1128 

MKLEEINTIDKNDFSKKPTYGAEKNYFKIAVMDFQKESTKRLDIDI YDTGTKIAKYWSEG 
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FLSI YPLRDNELNTNDIINSLKENTLFNELPNITEKDLNQSTNDFLSDNREFSFVKLRDH 
INQRFETDFKNDELFKDDKLNHIDAEFNLEPRIVGKYLQKKIKINDVISLDINNISKATK 
GGLMKVSNDKRYLKIRLETNIKEDLPDIQEGD* 

5 Sequence 1129 

Contig_0562_pos_8 629_984 3, 

putative peptide of unknown function 

atgagttattattggtattatgaaattttaaaatttgataatgatttaaataattatttc 
tctaatgaagacttaaaagttttttcttctaaatgtctttcaaatgaagaagaagatgaa 

10 ttatatttacaaggatcaaaaatatttattgatgacgactctaaaattaaacattcagaa 
agtattaaagatgtgtatgagtactattatgaaaaattcaggaaattagaaaattctaaa 
tctgatttttttatatttaaggacagtaaaaagttatttgaaaaaattgaagtacctatt 
aaacataatacgatacgaggagaatatattgaagatattactgtcagaaatataaatagc 
aatattttgcaaattaatccttattttttcaatgaaatgctggaagtatataaagaatat 

15 gaaaaagaatgcagtaatataaggaagttgcgtaatttacaatctgcaatcattttatct 
gataaaatttcatttgataacgatatcatgacattatactttcatgattcagataatgaa 
gaagataagttccaaaagaacaagtttgatttttcaaacgctaatgtatcttcattatat 
ttttatgtttttgattgggtagatattgaaaacgacaatctagaaatagtggatttaaaa 
tataaagtagcaaaaatttatttatctaaattcgaaaaggtttctgatgacgaaaaaata 

20 aagaaaaaaga aa t aca tcatgacttaaatgtt at gt a taatttaat act tcaaaaaaaa 
tctcaaaagtatt atgaatataacaaagtaataaggaa tcataaaatagaaatcattcaa 
cgaaaaatagaattaaaaaatgaacttaataaaaaattaatgagcatgatggtattcatt 
cccgtaactatttatggtttatatataacaatccaaaagagcaaagaatcactaaacatt 
ttcaataatgactttaatattatttatttcagttctctagttgcacttatatttataata 

25 ttatctttaataaatgatgtgaaatctattaatagtgactatgaaacaattattttagaa 
atcataaatacatataaaataaacaaaagtatggacgattttggaaataatataagttta 
agtgattttaaattttctttattttggatttttataatgattataactcttataattttt 
attttaattaagtga 

30 Sequence 1130 

MSYYWYYEILKFDNDLNNYFSNEDLKVFSSKCLSNEEEDELYLQGSKIFIDDDSKIKHSE 
SIKDVYEYYYEKFRKLENSKSDFFIFKDSKKLFEKIEVPIKHNTIRGEYIEDITVRNINS 
NILQINPYFFNEMLEVYKEYEKECSNIRKLRNLQSAIILSDKISFDNDIMTLYFHDSDNE 
EDKFQKNKFDFSNANVSSLYFYVFDWVDIENDNLEIVDLKYKVAKIYLSKFEKVSDDEKI 

35 KKKEIHHDLNVMYNLILQKKSQKYYEYNKVIRNHKIEIIQRKIELKNELNKKLMSMMVFI 
PVTIYGLYITIQKSKESLNI FNNDFNII YFSSLVALI FIILSLINDVKSINSDYETIILE 
IINTYKINKSMDDFGNNISLSDFKFSLFWIFIMIITLI 1 FILIK* 

Sequence 1131 
40 Cont ig_0 5 62_pos_l 3 60 1_1 3 93 6 , 

is similar to (with p-value 3.0e-38) 

>gp:gp|AF051916l AF051916_2 Staphylococcus aureus plasmid pJE 
1 remnant, of replication protein Rep (rep) , trimethoprim res 
istance protein DfrA (dfrA) , thymidylate synthetase ThyE (th 

45 yE) , and putative transposase Tnp (tnp) genes, complete cds; 
and unknown gene. NID: g3676404. 
atgaatgatataactaaacgtttattaaaaccaataattaatgagctttcttcaattttt 
aataaccttcatattaataagatcaaagctaaaaaaggacgtaaaattgaatggttagag 
tttacctttgacgctgagaaacgcattcacaacaagcgacaaccacaaatgactaatata 

50 ggtaagtcgcgccaatataccaatcgtgaaaaaacacctaaatggttagacgaaaagata 
tataaacaatctcaagagatacataatgaagatgcaaaattaaaacaagatcgagaggca 
tttcaacgtcaattagaagaaaaatgggaggaataa 

Sequence 1132 

55 MNDITKRLLKPIINELSSIFNNLHINKIKAKKGRKIEWLEFTFDAEKRIHNKRQPQMTNI 
GKSRQYTNREKTPKWLDEKIYKQSQEIHNEDAKLKQDREAFQRQLEEKWEE* 

Sequence 1133 

Cont ig_05 62_pos_7 8 30_7 156, 
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is similar to (with p-value 0.0e+00) 

>pir:pir|S04166|S04166 transposase 2 - Staphylococcus aureus 
plasmid pSKl transposon Tn4003 >gp:gp|X13290 jSATN4003_5 Sta 
phylococcus aureus multi-resistance plasmid pSKl DNA contain 
5 ing transposon Tn4003. NID: g46747. >gp:gp| U40259 |SEU40259_7 
Staphyloccous epidermidis trimethoprim resistance plasmid p 
SK639. NID: gl762079. >gp : gp I U40382 I SEU40382_1 Staphyloccous 
epidermidis plasmid pSK697 insertion sequence IS257(697B) p 
utative transposase gene, complete cds . NID: gl762093. >gp:g 

10 p|U40383|SEU40383_l Staphyloccous epidermidis plasmid pSK697 
insertion sequence IS257(697C) putative transposase gene, c 
omplete cds. NID: gl762095. >gp: gp | AF051916 I AF051916_3 Staph 
ylococcus aureus plasmid pJEl remnant of replication protein 
Rep (rep) , trimethoprim resistance protein DfrA (dfrA) , thy 

15 midylate synthetase ThyE (thyE) , and putative transposase Tn 
p (tnp) genes, complete cds; and unknown gene. NID: g3676404 
. >gp:gp| AF051917 |AF051917_15 Staphylococcus aureus plasmid 
pSK41, complete sequence. NID: g3676412. >gp:gp| AF051917 1 AFO 
51917_17 Staphylococcus aureus plasmid pSK41, complete seque 

20 nee. NID: g3676412. 

atgaactatttcagatataaacaatttaacaaggatgttatcactgtagccgttggctac 
tatctaagatatgcattgagttatcgtgatatatctgaaatattaaggggacgtggtgta 
aacgttcatcattcaacggtctaccgttgggttcaagaatatgccccaattttatatcaa 
atttggaagaaaaagcataaaaaagcttattacaaatggcgtattgatgagacgtacatc 

25 aaaataaaaggaaaatggagctatttatatcgtgccattgatgcagagggacatacatta 
gatatttggttgcgtaagcaacgagataatcattcagcatatgcgtt tatcaaacgtctc 
attaaacaatttggtaaacctcaaaaggtaattacagatcaggcaccttcaacgaaggta 
gcaatggttaaagtaattaaagcttttaaacttaaacctgactgccattgtaca tcgaaa 
tatctgaataacctcattgagcaagatcaccgtcatattaaagtaagaaagacaaggtat 

30 caaagtatcaatacagcaaagaatactttaaaaggtattgagtgtatttacgctctatat 
aaaaagaaccgcaggtctcttcagatctacggattttcgccatgccacgaaattagcatc 
atgctagcaagttaa 

Sequence 1134 

35 MNYFRYKQFNKDVITVAVGYYLRYALSYRDISEILRGRGVNVHHSTVYRWVQEYAPILYQ 
I WKKKHKKAY YKWRI DETY I KI KGKWS YLYRAI DAEGHTLDI WLRKQRDNHSAYAFI KRL 
IKQFGKPQKVITDQAPSTKVAMVFCVIKAFKLKPDCHCTSKYLNNLIEQDHRHIKVRKTRY 
QS I NTAKNTLKGI ECI YALYKKNRRSLQI YGFS PCHEI S I MLAS * 

40 Sequence 1135 

Contig_0562_pos_4 253_387 6, 

putative peptide of unknown function 

atgcaattttattatagtaattgtgaacagaatgcatcaaacttattagtcatcaacgta 
caacctaatgaaggattttctttatgtgtgaatggtaagaaaagtaatcaaaataatgaa 
45 atgcaaaaagtgaagctttcttatactatgccgattaaagataaaatgaacacagttgat 
gcatatgaaaatcttatttacgatacattaattggagaacaaacaaaatttacgcattgg 
gaagaattaaaaattcttggaaatttattgatgatattgaaaatgtatggaaacaagaat 
agccacagtttcctaattatgcctttggatgctatgggcctaaagaaagtgaaaaattac 
ttagtgaagacggattga 

50 

Sequence 1136 

MQFYYSNCEQNASNLLVINVQPNEGFSLCVNGKKSNQNNEMQKVKLSYTMPIKDKMNTVD 
AYENLI YDTLIGEQTKFTHWEELKILGNLLMILKMYGNKNSHSFLIMPLDAMGLKKVKNY 
LVKTD* 

55 

Sequence 1137 

Contig_0562j?os_3757_14 99, 

is similar to {with p-value 0.0e+00) 

>gp: gp I L76359 I STMDRRC_1 Streptomyces peucetius daunorubicin 
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resistance protein (drrC) gene, complete cds. NID: gll96906. 
atggattttattaatattacaggtgcttcacaaaataacttgaaaaacatagatgtaaat 
atcccaaaacacttagtaacggtatttacaggtcgttctggttcagggaaatcatcttta 
gtgtttaatactgttgctgcggagtctgaacagctactaaatgaaagttattctagttat 
5 attcaatttcatttaaatcaacaacccagaccgaaagtaaagaaaattaaaaatcttcct 
gtagcaatgacgattaatcagaaaagattcaatgggaattctcgctccacggtaggaaca 
gtttcagatatatatgcttctgttagattactgtggtctagaataggcgaaccgtttgtt 
ggttattcagatgcatattccttcaatagtcctaagggcatgtgtaaaacttgtgaggga 
ttaggatatattgaagacattaacttagatgaattgctagattgggataagtctttaaat 

10 gaaggtgcaatagactttccttcttttggaccagacaaagagcgtggtaaagcctatcga 
gatagtggtttatttgataataataaaaaattgaaagattatacagaagatgaattagaa 
ttgtttt tatatcaagagccaatgacattaaaaaatcctcctaaagaatggagaaagtca 
gctaaatatgtaggactaatacctagattcagtagaatatttttaggtgataaagaattt 
aataagaaacgctacgccaaacatcttaaaaatgtagtaaataataaaatctgttcaaca 

15 tgtaaaggtcaacgtctaaactcgaaaatattaagttctaaaattatgagtaaaaatatt 
tctgatttcacacaaatgacaattaaggaaaatttagagtttcttaataaattagaggat 
ccaacagccaaatatattattgatcctctcaaaaagcagttagaagcactagaatatatt 
ggattaagttatttaacgcttaaccgtgtcacaacgacattatcaggcggtgaagcgcaa 
cggcttaaattaatacgtcatttaaatagttctttatcggatttagtttacattatagat 

20 gaaccaagtgttggct tgcatccggaagatatagctaaaatcaatgaaattttaaaatca 
ttaaaagaaaaaggtaatactgtgttaattgttgaacatgatcccgatgtcattaaagaa 
ggagactatatcatagatatggggccaggttcaggaaaaaacggcggtgaaatcacattt 
gaaggaacatataatgaattactatcttcaaatacttcgacaggtaacgcattacgtaac 
aaacataatttaaaagagaat attcgtgaagctaaccacttttataatat cggtcctgtg 

25 acacaaaacaatttaaataacgtaaaaacgtctatacctaaacacgtattaacagtctta 
acaggtgttgctggttcaggtaagagtacccttgttaaagcaggttttgaaaataatgac 
cataccatctttattgatcaaaaagcagttcaaggatctaatagatctaatctattaacg 
tatttaggtgtttttgatagtgtaagaagctatttcagtaaagaaacaggcttaaataaa 
gctatgtttagttataattcaaaaggtgcctgtccaaattgtggtggaaagggctatatt 

30 aaaacggaacttgcttttatgggtgatttttcacagacatgtgaagtttgtcatggcaaa 
cgttataaacaagaagtattagatgctaccatagacgggtattcaattgccgatgttctc 
aatttgacggttgacgaaggtatcattttctttgataaaaagaatgatattaagtcaaaa 
ttacaatctgtaagtaagacaggtttgaattatatgtcactaggacaacctttgtccact 
ttgtctggtggagagattcaacgcgtgaaactaggacaacatttagatgaagagataaag 

35 aatagtatttttatttttgacgaaccaactacaggcctacatgaatcggatatccctata 
ttgatggagtgttttgatgatttaattgatcaaaacaatactgttattttgattgaacat 
aatttatcgattatgtgtgaagcagattggatcatcgatgtcggcccaggcccagggttg 
gatggcggaaaggtccaatttagtggaacacctaaaaacttcattgatagttcagaaaca 
ttgacatctaaacacttgaaacgctatatcaaacagtaa 

40 

Sequence 1138 

MDFINITGASQNNLKNIDVNI PKHLVTVFTGRSGSGKSSLVFNTVAAESEQLLNESYSSY 
IQFHLNQQPRPKVKKIKNLPVAMTINQKRFNGNSRSTVGTVSDIYASVRLLWSRIGEPFV 
GYSDAYSFNSPKGMCKTCEGLGYIEDINLDELLDWDKSLNEGAIDFPSFGPDKERGKAYR 

45 DSGLFDNNKKLKDYTEDELELFLYQEPMTLECNPPKEWRKSAKYVGLI PRFSRI FLGDKEF 
NKKRYAKHLKNVVNNKICSTCKGQRLNSKILSSKIMSKNISDFTQMTIKENLEFLNKLED 
PTAKYI IDPLKKQLEALEYIGLSYLTLNRVTTTLSGGEAQRLKLIRHLNSSLSDLVYIID 
EPSVGLHPEDIAKINEILKSLKEKGNTVLI VEHDPDVIKEGDYIIDMGPGSGKNGGEITF 
EGTYNELLSSNTSTGNALRNKHNLKENIREANHFYNIGPVTQNNLNNVKTSIPKHVLTVL 

50 TGVAGSGKSTLVKAGFENNDHTIFIDQKAVQGSNRSNLLTYLGVFDSVRSYFSKETGLNK 
AMFSYNSKGACPNCGGKGYIKTELAFMGDFSQTCEVCHGKRYKQEVLDATIDGYSIADVL 
NLTVDEGIIFFDKKNDIKSKLQSVSKTGLNYMSLGQPLSTLSGGEIQRVKLGQHLDEEIK 
NSIFIFDEPTTGLHESDI PILMECFDDLIDQNNTVILIEHNLSIMCEADWIIDVGPGPGL 
DGGKVQFSGTPKNFIDSSETLTSKHLKRYIKQ* 

55 

Sequence 1139 

Contig_0563_pos_4 995_4 084 , 

putative peptide of unknown function 

gtgattaggaatttacttgcaatgtgttatttgtatctaggtgagtatgatagcgccaaa 



285 



WO 01/34809 



PCI7US00/30782 



gcaatgtttgaagaacttttaaaggaagataattcagacgtgcatgcactttgtcactac 
acattattactttataataaaaaagaaacagaaaaatatcaaaaatatcttaaaatactt 
aataaagtagtaccactaaatgacgacgaaacctttaaattaggaatcgtattgagttat 
ttaaaacagtatcgtgcttctcaaaatttactttatccactttataaaaaaggtaaattt 
5 gtctctattcaaatgtataatgcattgagtttcaatttttattacctaggaaataaagac 
gaaagtattgagatgtggaacaagctcactcaaatttctgaagttgatgttggttatgca 
ccttgggtaattgaggaaagtaaaacggtatttgaatcacgagtgttaccattattacta 
gatgataataatcattatcgactttacggtatttttttacttcatcaattaaatggaaaa 
gaaatactaatgactgaagatatttggtcaattcttgaatcaatgaatgactatgagaaa 

10 ctttatctcacatatttggtacaaggactcacactcaataaattagattttatacacaga 
ggtatgcaaaggttgtataattttaagaaattcaaatataacacgtctttatttacagat 
tggattaatcaagcagaaatgattatagctgaaaatgtagatttagtagatgtcgataga 
tatgtagctgcatttgtttacctatcgtatcgtcgttctagccaaccacttaccaagagg 
caattgatggacgattttaatgtttctagatacaaactgaataaagcaattgaatttata 

15 ttgagcatataa 

Sequence 114 0 

VIRNLLAMCYLYLGEYDSAKAMFEELLKEDNSDVHALCHYTLLLYNKKETEKYQKYLKIL 
NKVVPLNDDETFKLGIVLSYLKQYRASQNLLYPLYKKGKFVSIQMYNALSFNFYYLGNKD 
20 ES I EMWNKLTQI SEVDVG YAPWVI EESKTVFESRVLPLLLDDNNH YRLYGI FLLHQLNGK 
EILMTEDIWSILESMNDYEKLYLTYLVQGLTLNKLDFIHRGMQRLYNFKKFKYNTSLFTD 
WINQAEMIIAENVDLVDVDRYVAAFVYLSYRRSSQPLTKRQLMDDFNVSRYKLNKAIEFI 
LSI* 

25 Sequence 1141 

Contig_0563_pos_4 019_3087, 

is similar to (with p-value 0.0e+00) 

>gp:gp|AJ223781 |SAAJ3781_1 Staphylococcus aureus trxB gene. 
NID: g3582102. 

30 atgactgaagtagattttgatgtagcaataatcggtgcaggtcctgccggtatgacagca 
gcagtatatgcatctcgtgccaatttaaaaactgtcatgattgaacgcggtatgccaggc 
ggtcaaatggcaaacactgaagaagtagagaattttccaggatttgagatgatcacaggt 
cctgacttatctactaaaatgtttgaacatgctaaaaaatttggtgcggaataccaatat 
ggcgatattaaatctgttgaagataaaggcgactataaagttatcaatttagggaataaa 

35 gagataacagcacatgcagttattatctcaactggagcagagtataaaaagattggcgtt 
cctggtgaacaagaattaggaggacgtggagtaagttattgtgcggtttgtgatggagca 
ttctttaaaaataaacgtcttttcgtaattggcggcggagattcagcggtagaagaaggt 
actttcttaactaaatttgcagataaagtaacgattgttcaccgtagagatgaattacgt 
gcacaaaacatcttgcaagaacgtgccttcaaaaatgataaagttgactttatttggagt 

40 catacacttaaaacaattaatgaaaaagatggtaaagt tggttcagttacacttgaatca 
actaaagatggtgctgaacagacttatgatgccgacggtgtattcatttatattggaatg 
aaaccactcacagcaccatttaaaaatcttggtattacaaatgacgcgggatacattgtc 
acacaagatgacatgagtactaaagtacgaggtatttttgctgcaggtgacgttcgtgat 
aaagggttacgtcaaattgttactgctacaggagacggtagtattgcggctcaaagtgca 

45 gctgattatattacagaattaaaagataattaa 

Sequence 1142 

MTEVDFDVAIIGAGPAGMTAAVYASRANLKTVMIERGMPGGQMANTEEVENFPGFEMITG 
PDLSTKMFEHAKKFGAEYQYGDIKSVEDKGDYKVINLGNKEITAHAVI ISTGAEYKKIGV 
50 PGEQELGGRGVSYCAVCDGAFFKNKRLFVIGGGDSAVEEGTFLTKFADKVTIVHRRDELR 
AQNILQERAFKNDKVDFIWSHTLKTINEKDGKVGSVTLESTKDGAEQTYDADGVFIYIGM 
KPLTAPFKNLGITNDAGYIVTQDDMSTKVRGIFAAGDVRDKGLRQIVTATGDGSIAAQSA 
ADYITELKDN* 

55 Sequence 1143 

Contig_0563_pos_2911_2003, 

is similar to (with p-value 3.0e-88) 

>sp:sp|O06973| YVCJ_BACSU HYPOTHETICAL 33.9 KD PROTEIN IN CRH 
-TRXB INTERGENIC REGION . >gp : gp I Z99121 I BSUB0018_163 Bacillus 
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subtilis complete genome (section 18 of 21) : from 3399551 t 
o 3609060. NID: g2635827. >gp : gp | Z9404 3 I BSZ94 04 3_9 B.subtili 
s genomic DNA fragment (88 kb) . NID: gl945641. 
atgacaagcaacgaaaaagaaatgggtaaaagtgaattgttagttgttacaggtatgtct 
5 ggagcgggtaaatcattggtgattcaaagtctcgaagatatgggatttttctgtgtagat 
aatttaccacctgtactattacctaaatttgtagaattgatggctcaaggaaatccttca 
ttgcaaaaagtagcaattgcaatagatttaagaggtaaggaattatttaaatctctagtt 
aaagaaattgatattattaaaagtcgtaatgacgtgattttagatgttatgtttttagaa 
gctaaaactgaaaaaattatttcacgttataaagaatcaagaagagcgcacccactaaat 

10 gaacaaggacaaagatcattaatagatgcaataaatgaggaacgtgaacatctatcagaa 
atccgaagtatcgctaattacgtgattgatacaacaaaattaaaacctaaagaattaaag 
caacgcatttcaaagttttatttagatgaaaactttgaaacatttacaatcaacgtgaca 
agtttcggtttcaagcatggtatacaaatggatgctgatttagtttttgatgtcagattt 
ctacctaatccctactatgtagaggaattgcgtccatttactggtttagatgagccagtg 

15 tacaattacgttatgaagtggaaagaaacccaaatattttttgataaattaacagattta 
ttaaaatttatgattcctggctacaaaaaagaaggtaaatcgcaattggttattgctata 
ggttgtacgggtggacaacatcgatcagtcgcat tagctaaacgtttagctgaatatctt 
aacgagatttttgaatataatgtttatgtgcatcatagagatgcgcatattgaaagtgga 
gagagataa 

20 

Sequence 1144 

MTSNEKEMGKSELLVVTGMSGAGKSLVIQSLEDMGFFCVDNLPPVLLPKFVELMAQGNPS 
LQKVAIAI DLRGKELFKSLVKEI DI IKSRNDVILDVTV1FLEAKTEKIISRYKESRRAH PLN 
EQGQRSLIDAINEEREHLSEIRSIANYVIDTTKLKPKELKQRISKFYLDENFETFTINVT 
25 SFGFKHGIQMDADLVFDVRFLPNPYYVEELRPFTGLDEPVYNYVMKWKETQI FFDKLTDL 
LKFMIPGYKKEGKSQLVIAIGCTGGQHRSVALAKRLAEYLNEIFEYNVYVHHRDAHIESG 
ER* 

Sequence 1145 
30 Cont ig_0 5 6 3_pos_0_8 6 1 , 

putative peptide of unknown function 

atgaaaaatgaactaacacgcatagaagttgacgaatcgaatgctaaagcagagctcagt 
gcattaattcgcatgaatggcgcacttagtctatcaaatcaacagtttgtaattaatgta 
cagacagaaaatgcgacaacagctcgtcgaatttactctcttatcaaacgtatatttaat 

35 gttgaagttgaaattttagttagaaaaaagatgaaattgaaaaaaaacaatatttatata 
tgtcgaacaaagatgttagcgaaagaaatactaaatgatttaggaattttaaaaaaggga 
gtttttactcacgatattgatccggatatgattaaagatgatgaaatgaaaagaagttat 
ttaagaggggctttcttagcaggtggttctgtaaataatcctgaaacatcttcatatcat 
cttgaaattttttcacaatatgaagatcattccgaaggtcttactaaattgatgaataqt 

40 tatgaactcaatgcgaaacatttggaacgtaaaaaagggagtattgcgtatcttaaagaa 
gctgaaaaaatttccgactttcttagtttgataggtggctatcaagcattgttaaagttt 
gaagatgtaagaattgtccgtgatatgcgtaattcggttaatcgtcttgttaattgtgaa 
acagcaaatcttaataaaactgttagcgcagcaatgaaacaggttgaaagtatacaatta 
attgatgaagaaattgggcttgaaaatttacctgatcgtttaagagaagtagcgaagctc 

45 agagtagaacatcaagaaatatcgttaaaagaattgggtgagatggtttctacagggcct 
atatctaaatcaggtACCATT 

Sequence 114 6 

MKNELTRIEVDESNAKAELSALIRMNGALSLSNQQFVINVQTENATTARRIYSLIKRIFN 
50 VEVEILVRKKMKLKKNNI YICRTKMLAKEILNDLGILKKGVFTHDIDPDMIKDDEMKRSY 
LRGAFLAGGSVNNPETSSYHLEIFSQYEDHSEGLTKLMNSYELNAKHLERKKGSIAYLKE 
AEKISDFLSLIGGYQALLKFEDVRIVRDMRNSVNRLVNCETANLNKTVSAAMKQVESIQL 
IDEEIGLENLPDRLREVAKLRVEHQEISLKELGEMVSTGPISKSGTI 

55 Sequence 1147 

Contig_0564_pos_1869__2756, 

is similar to (with p-value 0.0e+00) 

>sp: sp | P37527 | YAAD_BACSU 31.6 KD GUANYLYLATED PROTEIN IN DAC 
A-SERS INTERGENIC REGION. >gp : gp I D2618 5 I BAC180K_75 B. subtil 
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is DNA, 180 kilobase region of replication origin. NID: g467 
326. >gp:gp| Z99104 | BSUB0001_11 Bacillus subtilis complete ge 
nome (section 1 of 21): from 1 to 213080. NID: g2632267. 
atgtctaaaatagtaggatcagatcgagttaaaagaggaatggctgaaatgcaaaaaggc 
5 ggtgtcattatggacgtcgttaatgcagaacaagctaaaattgctgaagaagccggagct 
gttgccgtaatggcattagagcgtgtaccatcagatattcgtgctgctggcggtgttgca 
cgtatggcgaatcctaaaatagttgaagaagttatgaatgccgtatcaattccggttatg 
gctaaagccagaattggtcatattacagaagctagagttttagaatcgatgggtgttgac 
tatatagatgagtctgaagtattaacgcctgcagacgaagaatatcatttaagaaaagat 

10 caatttacagttccttttgtgtgtggctgtcgtaacttaggtgaagcagcacgacgcatt 
ggtgaaggtgcggcgatgttgcgtacgaaaggtgaacctggtactggtaatattgttgaa 
gctgtccgtcatatgagacgtgttaattctgaagttagccgcttaacagttatgaatgat 
gatgaaattatgacatttgcaaaagatttgggtgcaccttatgaagtattaaaacaaatt 
aaagataatggacgtcttcctgtagttaattttgcagctggtggtgttgctacgcctcag 

15 gatgcagcactaatgatggaattaggtgcagatggtgtatttgttggttcaggtatattt 
aaatctgaagatcctgaaaaatttgctaaagctatcgttcaagctacaacacattatcaa 
gattatgagttaatcggaaaattggctagtgagctaggtacggctatgaaaggtctagat 
attaatcaaatttcactagaagaaagaatgcaagagcgtggttggtaa 

20 Sequence 1148 

MS K I VG S DRVKRGMAEMQKGG VI M DV VTIAEQAKI AEEAG AVAVMALER V PS D I RAAGG VA 
RMANPKIVEEVMNAVSIPVMAKARIGHITEARVLESMGVDYIDESEVLTPADEEYHLRKD 
QFTVPFVCGCRNLGEAARRIGEGAAMLRTKGEPGTGNIVEAVRHMRRVNSEVSRLTVMND 
DEIMT FAKDLGAPYEVLKQIKDNGRLPVVNFAAGGVATPQDAALMMELGADGVFVGSGIF 

25 KSEDPEKFAKAIVQATTHYQDYELIGKLASELGTAMKGLDINQISLEERMQERGW* 

Sequence 1149 

Contig_0564_pos_2969_3316, 

is similar to (with p-value . 1 . 0e-33) 

30 >sp:sp| P37528 | YAAE_BACSU HYPOTHETICAL 21.4 KD PROTEIN IN DAC 
A-SERS INTERGENIC REGION. >gp : gp | D26185 I BAC1 80K_7 6 B. subtil 
is DNA, 180 kilobase region of replication origin. NID: g467 
326. >gp:gp| Z99104 j BSUB000112 Bacillus subtilis complete ge 
nome (section 1 of 21): from 1 to 213080. NID: g2632267. 

35 atgtttggaacatgtgctggattaattgttcttgcaaaaaatgttgaaaatgagtctggt 
tatttaaataaattagatataactgttgagcgtaattcattcggtagacaagtcgatagc 
tttgaatctgaacttgatattaaagggatagcaaatgatattgagggagtatttattaga 
gcacctcatattgctaaagtggataacggagtggaaatacttagtaaagttggaggtaaa 
atagtagccgtcaaacaaggacaatacctcggtgtttctttccatccagaactaactgat 

40 gattatcgtatcactaagtattttattgaacacatgattaaacattaa 

Sequence 1150 

MFGTCAGLIVLAKNVENESGYLNKLDITVERNSFGRQVDSFESELDIKGIANDIEGVFIR 
APHIAKVDNGVEILSKVGGKIVAVKQGQYLGVSFHPELTDDYRITKYFIEHMIKH* 

45 

Sequence 1151 

Contig_0564_pos_3912_4 925, 

putative peptide of unknown function 

atggaacgattttgttgtgtaaatcaaattaactatattcaaatgaatccgttagaagcc 
50 aaatttaaaacgagcgctctaagatcatggaaaactgatcaggcagatgctcataagctt 
gcttgtttaggaccgacgcttaaacaaacagacagcttacctatacatgagttaatattc 
tttgaattaagagaacgcgtccgttttcatctagaaatcgagaatgaacaaaatcgactt 
aaatttcagatccttgaattactccatcaaacattccctggtttagaaagattgtttagt 
agtcgatattcaatcattgcactcaacatcgcagaaatctttactcatccagacatggtt 
55 cttgatatcgacaaggaggtactgattacacatatattcaattctacagataagggaatg 
tcaatggataaagctacaaaatatgcacttcaattaagggtgattgctcaagaaagctat 
cctagtgtcgatagacattcctttctagtcgaaaaattacgcttacttattcaacaatta 
aaacaatctat tcatcatctcaaacaattagatgatgccatgattcaattagcacaacaa 
ctcgattattttgaaaatattcattcgatacctggtattggtaagctaagcacagctatg 
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attattggggagattggtgatattaagcgatttaaatcaaataaacaactcaatgctttt 
gttggcattgatatcaaacgatatcaatcaggtcatacacactgtagagataccatcaac 
aagcgtggtaataaaaaagcgagaaaacttttattttgggtgattatgaatataataaga 
gggcagcatcattatgacaatcatgtcgtcgattattactacaaactaagaaagcagcct 
5 aatgagaaacctcataagactgccatcattgcttgtataaatcgattattaaaaacaatt 
cattatcttgtaatgaatcataaattgtacgattatcaaatgtcaccacattag 

Sequence 1152 

MERFCCVNQINYIQMNPLEAKFKTSALRSWKTDQADAHKLACLGPTLKQTDSLPIHELIF 
10 FELRERVRFHLEIENEQNRLKFQILELLHQTFPGLERLFSSRYSI IALNIAEIFTHPDMV 
LDIDKEVLITHIFNSTDKGMSMDKATKYALQLRVIAQESYPSVDRHSFLVEKLRLLIQQL 
KQSIHHLKQLDDAMIQLAQQLDYFENIHSIPGIGKLSTAMI IGEIGDIKRFKSNKQLNAF 
VGIDIKRYQSGHTHCRDTINKRGNKKARKLLFWVIMNIIRGQHHYDNHVVDYYYKLRKQP 
NEKPHKTAI I AC I NRLLKTI H YLVMNHKLYDYQMS PH * 

15 

Sequence 1153 

Con t ig_0 5 65_pos_7 7 6_ 1 1 9 5 , 

is similar to (with p-value 6.0e-53) 

>sp:sp|P32727 |NUSA_BACSU N UTILIZATION SUBSTANCE PROTEIN A H 
20 OMOLOG (NUSA PROTEIN) . >pir : pir I C36905 | C36905 nusA homolog - 
Bacillus subtilis >gp : gp I Z18631 | BS0RF1T7A_2 B. subtilis infB 
-nusA operon. NID: g49314. 

gtgctgtcagaagctgaaagaagtcctaatgagaaatatattcctaatgaacgtatcaag 
gtgtacgtaaataaagttgaacagactacaaaaggtccacaaatttacgtatcaagaagt 
25 catcctggattactaaaacgcttattcgaacaagaagttccagaaatttatgatggtact 
gttattgttaaatcagtagcgcgtgaagctggagatcgttctaaaattagcgtgtattct 
gataatcctgatatagatgctgttggcgcatgtgtaggttctaaaggagcacgagtagaa 
gcggttgttgaagaacttggtggcgaaaaaatcgatatcgtccaatgggatgaagatccg 
aaagtatttgttcgtaatgct ttaagtccatcacaagttttagaagtaattgt taaataa 

30 

Sequence 1154 

VLSEAERSPNEKYIPNERIKVYVNKVEQTTKGPQI YVSRSHPGLLKRLFEQEVPEI YDGT 
VIVKSVAREAGDRSKISVYSDNPDI DAVGACVGSKGARVEAWEELGGEKIDIVQWDEDP 
35 KV FVRNALS PSQVLEVI VK* 

Sequence 1155 

Contig_0565_pos_2337_3152 , 

is similar to (with p-value 1.0e-35) 

40 >gp:gp|M24 523 |BACRTP_3 B. subtilis rtp gene, complete cds and 
proC gene (put.), 5' end. NID: gl43477. 
atgaaacttgtattttatggtgctggtaatatggcgcaggcaatt tttactggaattatt 
aattccaacaatttaaatgcaaatgatatttatttaactaataaatccaatgaacaagca 
ttaaaaagctttgcagaaaaattaggggttaattatagttatgatgatgaagcattactc 

45 aaagatgccgattatgtatttttaggtacaaagccccatgattttgaaaatttagctaat 
cgtattagagaacacattactaatgataataggtttatttctataatggcaggtttatct 
attgattatattcgtcagcagcttaataccaataatccattagctcgtattatgccaaat 
acaaatgctcaagttggacattcggttactggaataagtttttcaaataattttgatcct 
aaatctaaaaatgaagtggatgaattaatcaatgcatttggatcagttatagaagtctcc 

50 gaagaacatctacatcaagttactgcaattacaggaagtgggcctgcatttttatatcat 
gtatttgaacaatatgtaaaagcaggtacagaattaggtttagaacgaaatcaagtcgaa 
gaatctatacgcaatttaattattggaacaagtaaaatgattgagcgttcagacttaagt 
atgtctcaattaaggaaaaatattacatctaaaggtggtactacacaagctggacttgat 
gcactatctcaatatgatattgtatcgatgtttgaagattgt ttaggtgcagctgtgaat 

55 agaagtatggaattatcacataaagaagatgaataa 

Sequence 1156 

MKLVFYGAGNMAQAI FTGIINSNNLNANDI YLTNKSNEQALKSFAEKLGVNYSYDDEALL 
KDADYVFLGTKPHDFENLANRIREHITNDNRFISIMAGLSIDYIRQQLNTNNPLARIMPN 
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TNAQVGHSVTGISFSNNFDPKSKNEVDELINAFGSVIEVSEEHLHQVTAITGSGPAFLYH 
VFEQYVKAGTELGLERNQVEESIRNLIIGTSKMIERSDLSMSQLRKNITSKGGTTQAGLD 
ALSQYDIVSMFEDCLGAAVNRSMELSHKEDE* 

5 Sequence 1157 

Contig_0565_pos_4011_3202, 

is similar to (with p-value 6.0e-57) 

>sp: spl P54 54 8 I YQJK_BACSU HYPOTHETICAL 34.0 KD PROTEIN IN GLN 
Q-ANSR INTERGENIC REGION. >gp: gp I D84432 | BACJH642_266 Bacillu 
10 s subtilis DNA, 283 Kb region containing skin element. NID: 
g2627063. >gp : gp I Z991 16 I BSUB0013_96 Bacillus subtilis comple 
te genome (section 13 of 21): from 2395261 to 2613730. NID: 
g2634723. 

gtgggcgagggaacacaacatcaaattttaagacactctattaagttagggaagatagat 
15 catatttttataacacacatgcatggtgatcatatttttggtctccctggacttttaaca 
agtcgttcgtttcaaggtggagaaaataaaccccttactattatagggcctaaaggtatt 
caaaattacatagaaacatct ttacaactttctgaatcgcatttgaattatccgattacc 
tatatcgaaatcaatcaacaattagcgtatcaccacaatggttttactgtacaagctgaa 
atgcttaaccatggcataccttcattcggatatcgtattgaagccccaatcacgcctggt 
20 acaatcaatgtagaggccttgagaggtattggactagagcctggtccaaaatatcaggaa 
gtcaaattacaagaaacgttcgaatataaaggattaatttacaattcggatgattttaaa 
ggtaaagctaaacctggtccaattatcagtatatttggtgatacaaaaccgtgtgaaaat 
gaatatgaattagcaaagaactcagatttaatgattcatgaagcaacttacattgaagga 
gataagaaacttgctaataattaccatcatagtcatatagacgatgtatttaatctaatt 
25 aagcaagctaatgtaaataaaagtcttatcactcatatcagtaacagatataacattgat 
gaagttacatcaatatacaatgagttatcccttgatcaaacttctccacatttttatttc 
gttaaagattttgatactttcaaaatataa 

Sequence 1158 

VGEGTQHQILRHSIKLGKIDHIFITHMHGDHIFGLPGLLTSRSFQGGENKPLTI IGPKGI 
QNYIETSLQLSESHLNYPITYIEINQQLAYHHNGFTVQAEMLNHGIPSFGYRIEAPITPG 
TINVEALRGIGLEPGPKYQEVKLQETFEYKGLIYNSDDFKGKAKPGPIISIFGDTKPCEN 
EYELAKNSDLMIHEATYIEGDKKLANNYHHSHIDDVFNLIKQANVNKSLITHISNRYNID 
EVTSIYNELSLDQTSPHFYFVKDFDTFKI* 

Sequence 1159 

Contig_0565_pos_2176_1418, 
putative peptide of unknown function 

gtgataggcaaacactttattataactggagcaacgagtgggttaggttttgcaataacc 
aatgaattacttcaaagaggggcccatgttactatacttgcaagaaatatagataagttc 
aatcgaatcaaagaaaactattttaaacctgaacatatcaatgtgattaaatgtgattta 
atgcaacgaaaagatattgaatcattacaaaaatttttaaatacacctataaatggtttc 
atctacagttcaggtgttggatattttaagtctataagtgagcattcaactcgtgaagta 
gtagaaacttacgaggttaatcttacaaattttaatttgttatacaaagtgattcaacca 
caattagtaaaagcagcatatatcgttggtatatctagtcaagctgctcttgtttcacag 
gctaatgcggcacattacggtgcatcgaaagcagggtttagcgccgttcttaatgcattg 
agattagaacaaccggaattaaaagtgctcaatgtacagcccggtccaatagatacacca 
ttccaaaaaaacgcagatcctactctaaagtattttaaaaattatagacacatgatgata 
caacctcaacaacttgccaagcaaatagtggaaggaataatactaaataaaattgaaatt 
aa tcaaccatcatggatgcaaataatgcttaaattttatcaattatgtccacgtacacLa 
gaaaaattatgtccaaatctatttaaaaataaagtttaa 

Sequence 1160 

VIGKHFIITGATSGLGFAITNELLQRGAHVTILARNIDKFNRIKENYFKPEHINVIKCUL 
55 MQRKDIESLQKFLNTPINGFI YSSGVGYFKSISEHSTREVVETYEVNLTNFNLLYKVIQP 
QLVKAAYIVGISSQAALVSQANAAHYGASKAGFSAVLNALRLEQPELKVLNVQPGPIDTP 
FQKNADPTLKYFKNYRHMMIQPQQLAKQIVEGIILNKIEINQPSWMQIMLKFYQLCPRTL 
EKLCPNLFKNKV+ 
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Sequence 1161 

Contig_0566_pos_1171_14 91, 

putative peptide of unknown function 

atgaatgtacagtttaagaaaggtgctttagaattaattgttctgctaattattaaaaaa 
5 gaagatcagtatggttattcacttgtacaaaatatctccagatatatgaccatagctgaa 
ggtacagtttatcctctgctaaggcgtttggttaaaagtggggaactgagtacgtattat 
caaccttcaactgaaggtccgtctcgaaagtattatcaattaactcaacagggggctgcg 
agagttaatcaattagaggaggattggaaattgtttacggaagctgtagaacatttcatt 
gaggagagtgagaatgaatga 

Sequence 1162 

MNVQFKKGALELIVLLI IKKEDQYGYSLVQNISRYMTIAEGTVYPLLRRLVKSGELSTYY 
QPSTEG PSRKY YQLTQQGAARVNQLEEDWKLFTEAVEH FI EESENE * 

15 Sequence 1163 

Con t i g_0 5 6 6_po s_l 5 3 6_2 04 8, 

putative peptide of unknown function 

atgaataaagaagaaaaagaagatattttgaatgaatacgatacgcalttttatagcgga 
cagcaagagggaaagtctgaatcagacgtgtgtaaagaattaggtaatccaaaattaata 

20 ggtaaggaacttacagctacttccagtgtagaaaatgcacatcaaaaagtgtcgttaatg 
aatatttcatccgcaattgtagcagtaatggggctaagtttgcttaacttttttattgtt 
ataataccagcttttttatgcattttgctcgtattaaccttcatcatttttactctagct 
tcactagctgcaccattgatgttgctcattaaaggaattatggatggttttcattccatt 
atcttatatgacgcatttatgactggtttaatgtttggtgttggactcgtacttgcagtg 

25 gtgacttactatctcattaagtggctatttgatgtgactatgaaatatctaaaatggaat 
atctctattgtcaaaggaagtgtacaatcatga 

Sequence 1164 

MNKEEKEDILNEYDTHFYSGQQEGKSESDVCKELGNPKLIGKELTATSSVENAHQFCVSLM 
30 NISSAIVAVMGLSLLNFFIVIIPAFLCILLVLTFIIFTLASLAAPLMLLIKGIMDGFHSI 
IL YDAFMTGLM FGVGLVLAV VT Y YLI KWL FDVTMK YLKWN I S I VKGS VQS * 

Sequence 1165 

Con t i g_0 5 6 6_po s_2 1 0 5_2 791, 

35 putative peptide of unknown function 

gtgggagtttatgcacaaaataaaaaattgagtaaagacaatcaatataataatcaaaca 
acaaatttaatgaaaaactatgatgataatactgtgaaaagtatttacattgatggaaaa 
gtaagtgatataactgtgaaaaaaggtaaacatttttcggttaagtccaaagggaatgac 
aaaaatttaaacgtaactagcaaggtgaacaatcaacgttgggtaattacagagcgtcaa 

40 acaagtccacatattaattttagaatacaaggtaaagttagtaatcacattacgattaca 
gtacctaaatatattaaaaacatagatattaaaactaatgccggggatttaaatattgtt 
ggagtaaatagtggcacaggaagatttgatgctgaatctggagacattaaagttcaaaaa 
ggacgatataaaaaggtgacacttcataatgaggatggggatattcaaatgaaacaatta 
gaccctgatattcctttacgtattaaaaatgaagaaggggatataaacttgaattataaa 

45 aaagaacttcatcacacccaaatcatcactcgtaatgaagaaggggaaacagacatcgat 
catcgtgtgttatataatagtaaagttgaaaatggaaataataaagtgaaattaatcaat 
gaaaatggagatattaaagtaaaataa 

Sequence 1166 

50 VGVYAQNKKLSKDNQYNNQTTNLMKNYDDNTVKSI YIDGKVSDITVKKGKHFSVKSKGND 
KNLNVTSKVNNQRWVITERQTSPHINFRIQGKVSNHITITVPKYIKNIDIKTNAGDLNIV 
GVNSGTGRFDAESGDIECVQKGRYKKVTLHNEDGDIQMKQLDPDIPLRIKNEEGDINLNYK 
KELHHTQI ITRNEEGETDI DHRVLYNSKVENGNNKVKLINENGDIKVK* 

55 Sequence 1167 

Contig_0566_pos_385 9_4 332, 

putative peptide of unknown function 

atgttatggcatttaatttttatgattcctacaattattggttactcttttggaatgttc 
tgtttaatatcaagtgaaacttttaaaagtaaaggatttctcttgttgggggtaggaatt 
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tttatcttttctttaatttatattttgatttatagtttagttacattaatacctaatgtt 
gcgatattatcaagaagattccatgatcgctcaatgacgatgactcttccgattattttt 
tatgttttcactgtcattgtatcaggttttaattatttaccaaatatagataattcagcg 
gtattaatatttatgggaattatctgcttaatctactggattgggtcgatattaatatta 
5 gtattgacttgcttagatagtaaaacagagtctaataaatatggaccaagtccaaagtac 
aatcgtaacgagacaaatttccatggtgataatgctaatccagttgataaataa 

Sequence 1168 

MLWHLIFMIPTIIGYSFGMFCLISSETFKSKGFLLLGVGIFIFSLIYILIYSLVTLIPNV 
10 AILSRRFHDRSMTMTLPIIFYVFTVIVSGFNYLPNIDNSAVLIFMGIICLI YWIGSILIL 
VLTCLDSKTESNKYGPSPKYNRNETNFHGDNANPVDK* 

Sequence 1169 

Cont i g_0 5 6 6 j?os_7 5 1 4_8 1 1 3 , 

15 putative peptide of unknown function 

atgatggtaaaggttattcacacttatgatgccaatcatcgatggtccgtacaatatgag 
gcaaaatcaacaaaaaaaacagtatttaatccttcaaatcatgtttattttaatttgaat 
cgtgataacaatgttgtctataaccactgtataaatagttcagcattaaaaatgtatatg 
ttaaataataaacatattgttaaagaggggcaatctcttgatttacatcgattattggat 

20 acaaataaagtctatttaaaagatattt ttgaaagtgacaatgaaactttgcaacaacaa 
attaatcattataatggcattgatcatcctttcgaatttggtggaaatgagcttaccatt 
gataattctgaatttgaacttaatattaagactgatatgcctcattttgtaatgtttacg 
tttaatgatccacaagtttggaacaatgattttaatatttataaggcgcattctggattt 
tccatcgaaactcaatatatgcctaatgatataaatatgtacggcgccaaagctcagtct 

25 attttagaggcagatacattatttacatctaaaacaagttttcaaattcatgaaaagtag 



Sequence 1170 

MMVKVIHTYDANHRWSVQYEAKSTKKTVFNPSNHVYFNLNRDNNVVYNHCINSSALKMYM 
30 LNNKHIVKEGQSLDLHRLLDTNKVYLKDI FESDNETLQQQINHYNGIDHPFEFGGNELTI 
DNSE FELNI KTDMPHFVMFTFNDPQVWNNDFN I YKAHSGFS IETQYMPNDI NMYGAKAQS 
ILEADTLFTSKTSFQIHEK* 

Sequence 1171 
35 Contig_0566_pos_64 70_4 905, 

is similar to (with p-value 1.0e-95) 

>gp:gp|X89408 | BSARAABD_2 B.subtilis DNA for araA, araB and a 
raD genes. NID: g!924929. 

gtgatattagctgacacatccaacggacatatcatatcaagatatgaggaagactatgcg 

40 aacggaacttatatgaactcattatatgataaaccgttacctgaaaactacttcttacaa 
aatgctgacgactatttacaaattcttgaacaaggcgttcaatttgtattagaagatagt 
aaagttaataaaaacgatgtggttggaattggagtcgactt tacaagcagtacaat tatc 
tttctcgatgaacaatttgaaccgcttcatcgtcatgaagatttaaagacaaatccacac 
gcgtacgtaaaattatggaaacatcatggagctcaagatgaggcaaactatatgattcag 

45 atgagtaagaataaaaattggttagattattatggctcaagcgtaaatagcgaatggatg 
ataccgaaaatcctggaagttaaacatgaagcaccagaaatacttagaagagcacggtat 
ataatggaagctggagattacatcactagtatactaacaaattcaaatatacgatcaaat 
tgtggtattggttttaaaggtttttgggacaatgaagctggatttaattacgacttcttc 
catagcgtggatcctgatttacctaaaatcgtcaaagaaaaatgtgaagcgccaatcata 

50 tcaattggagaaagtgcaggtcgtttatgtaaagactatcaacaaatatgggggctttct 
caatatgtccaggtttcaccttttatcatagatgcacattctggcgtcttaggtgttggg 
gcaatagaagctggagaattcactgcagtcattggtacaagtacttgtcatctcatgcta 
gattcaagacaagtacccatttcttcaataactggctcagttaaaaatgctattatacct 
ggattatatgcctatgaagctggtcaaccagctgtcggtgatttgtttgaatactcaaag 

55 aaccaagcacctaaacatattgtagatcaagcaaatgaacatcatatgcatgtgcttaac 
tatttagaggaattagcaagtcacattagaatagaagaacaacatgttgttgttttagat 
tggttgaatggaaatcgtagtatacttagtaatagtcatctaactggaagtatctttggt 
cttacacttcaaacaccgtatgaaatgattcatcgagcatatattgaagctacagcattt 
ggaacaaaattaattatgaaacaatttgaagataatcatattcctgttcatacagtgtat 
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gcgtctggtggtatcccacaaaagagtaaattactcgttgaaatttatgcaaatgtttta 
aataaaagggttgtcgtcatagattcatctaatgcttcagcattaggtgcagcgatgtta 
ggtgcaaatgttgggaatgcatatagtacattaaaagaggcggcattatctatgaagcaa 
cctatagcttatatacaagaacctgaaatccaaaaagttcaagcttataaaccactctac 
5 cataaatattgtgaactacatgatttattagatcgtcaatatcctgaattatcatatttg 
atttaa 

Sequence 1172 

VILADTSNGHI ISRYEEDYANGTYMNSLYDKPLPENYFLQNADDYLQILEQGVQFVLEDS 
10 KVNKNDVVGIGVDFTSSTIIFLDEQFEPLHRHEDLKTNPHAYVKLWKHHGAQDEANYMIQ 
MSKNKNWLDYYGSSVNSEWMIPKILEVKHEAPEILRRARYIMEAGDYITSILTNSNIRSN 
CGIGFKGFWDNEAGFNYDFFHSVDPDLPKIVKEKCEAPIISIGESAGRLCKDYQQIWGLS 
QYVQVS PFII DAHSG VLGVGAI EAGE FTAVI GTSTCHLMLDSRQVPI SSI TGSVKNAI I P 
GLYAYEAGQPAVGDLFEYSKNQAPKHIVDQANEHHMHVLNYLEELASHIRIEEQHWVLD 
15 WLNGNRS ILSNSHLTGSI FGLTLQTPYEMI HRAYI EATAFGTKLIMKQFEDNH I PVHTVY 
ASGGIPQKSKLLVEIYANVLNKRVVVIDSSNASALGAAMLGANVGNAYSTLKEAALSMKQ 
PIAYIQEPEIQKVQAYKPLYHKYCELHDLLDRQYPELSYLI* 

Sequence 1173 
20 Contig_0566_pos_854_4 95, 

putative peptide of unknown function 

atgatagaatttgatgcaattaccacattatgtttggcatgtgttttatatttaattggt 
caaacaataatcaaccatgtttctat tttaaggcgaatctgtattccagcacctgtcat t 
ggtggtcttttatttgcaatattagtggctatattagattcatttaatatcgttaaaata 
25 aaacttgattcggcgttcattcagaatttctttatgctcgctttctttactacaattgga 
cttggcgcatcattaaaactattcaaaattggcggaaaagtcatgttgatagcgagaatg 
tctccatttttaggattttggacaaccattaacgcattgtccatatccttagcaccttga 



30 Sequence 1174 

MIEFDAITTLCLACVLYLIGQTIINHVSILRRICI PAPVIGGLLFAILVAILDSFNI VKI 
KLDSAFIQNFFMLAFFTTIGLGASLKLFKIGGKVMLIARMSPFLGFWTTINALSISLAP* 



35 Sequence 1175 

Contig_0568j?os_6584_7120, 

is similar to (with p-value 3.0e-26) 

>gp:gp| AF024506I AF0245061 Bacillus subtilis SecDF protein ( 
secDF) gene, complete cds . NID: g3220155. 

40 atgataatcaaaccaataattacaattaaaatactaa'gtgaaataagtggcttagctaat 
ttaacaaagtttaacctttcatatgatgtttttaaatcatgtacatctttaccttcatta 
atatcatgtctatccttcttcttaacaccaaataaccagtattgttttttaaagaagttt 
gaagataccagtaatgataacaaccctcttgataagaatactgcggttacaaatatcatt 
aaaatacctaagagtaacatggttgcgaagcctttgactgaactttctccaaagaagaaa 

45 agcacagctgcagcgatgacagttgttaagttggaatcaaatatagttaagaatgaactt 
ttatttgcttttgaatacgcttgtttaagcgtgcgtccaattcttagttcatctttaata 
cgttcatacattatgatattggcatcgacagccatacctacacctaaaactaatgccgcc 
aatccaggtagagttaatacacctgatatgaaattgaatgcgactaaagttaaataa 

50 Sequence 1176 

MIIKPI ITIKILSEISGLANLTKFNLSYDVFKSCTSLPSLISCLSFFLTPNNQYCFLKKF 
EDTSNDNNPLDKNTAVTNIIKIPKSNMVAKPLTELSPKKKSTAAAMTVVKLESNIVKNEL 
LFAFEYACLSVRPILSSSLIRSYIMILASTAIPTPKTNAANPGRVNTPDMKLNATKVK* 

55 Sequence 1177 

Contig_0568_pos_7956_5731, 

is similar to (with p-value 0.0e+00) 

>gp:gp|AF024506|AF024506_l Bacillus subtilis SecDF protein ( 
secDF) gene, complete cds. NID: g3220155. 
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atgacgtataagaatgtagttaaaaatgttaatttaggtctagatttgcaaggtggtttt 
gaagtcctcttccaagtagatcctttaaataaaggagataaaattgataaaaaagcactt 
caagctacatctcaaacattagaaaatcgtgtaaatgttctaggtgtatcagaaccgaaa 
atacaaatcgaagatccaaatcgaattcgtgtacaattagcaggtatcaaggatcaagca 
5 caagcgcgtaaat tat tat cgacacaagctaatttaacaattagagatgctgaaga teat 
gttttaatgtctggttcagacattaaacaaggctctgctaaacaagaatttaaacaagaa 
actaatcaaccaacagttacatttaaagtaaaaagtaaagataaatttaagaaagtaact 
gaaaagatttctaaaaaacgtgacaatgtcatggtagtttggttagatttcgaaaeaggc 
gatagttacaagaaagaagctaaaaagcaacaagaaggtaaaaagcctaaatttatatct 

10 gcagcgagtgtagaccaacctattaattctagtagtgttgaaatttcaggtggcttcaat 
gggaaaaaaggtgttgaagaagcgaaacaaatagctgagttattaaatgccggctcatta 
ccagttgatttaaaagaaatttactctaactctgttggtgcacaatttggtcaagatgct 
cttgataagaccatgtttgcatcaattgtaggtatagcattaatttatttatttatgctt 
ggtttctatcgtttgcctggtttagttgcaatcattgccttaaccacttatatttattta 

15 actttagtcgcattcaatttcatatcaggtgtattaactctacctggattggcggcatta 
gttttaggtgtaggtatggctgtcgatgccaatatcataatgtatgaacgtattaaagat 
gaactaagaattggacgcacgcttaaacaagcgtattcaaaagcaaataaaagttcattc 
ttaactatatttgattccaacttaacaactgtcatcgctgcagctgtgcttttcttcttt 
ggagaaagttcagtcaaaggcttcgcaaccatgttactcttaggtattttaatgatattt 

20 gtaaccgcagtattcttatcaagagggttgttatcattactggtatcttcaaacttcttt 
aaaaaacaatactggttatttggtgttaagaagaaggatagacatgatattaatgaaggt 
aaagatgtacatgatttaaaaacatcatatgaaaggttaaactttgttaaattagctaag 
ccacttatttcacttagtattttaattgtaattattggtttgattatcatttcaatattt 
aaattaaacttaggtattgatttctcatccggaacaagagcagatattcaatctaaaaat 

25 gctataacacaagcacaggttgagaaaactgtaaaatcagttggattggaaccagatcaa 
atacagattaatggtagtggaaataaaaatgccacagttcagtttaaaaaagatttatca 
cgtgaggaagacaataaattaagtgctaaggtgaaatctgaatttggagataatccacaa 
attaatacegtt tea cctc teat aggecaagaget age taaaaat get gtaactgeatta 
atacttgcttctataggcattattatctatgtttcactaagatttgaatggcgtatgggt 

30 ctatcttctgtacttgcattattacatgacgtatttatcatcattgcaatctttagtttg 
tttagattagaagtagatttaacatttattgcagcagtattaactatcgttggttattca 
atcaatgatacaatcgtaactttcgaccgtgttcgagaaaatctgcataaagttaaagta 
attacgcatactgatcaaattgatgatatagtcaacegctctattagacaaactatgaca 
cgttctattaatacagtgttgactgtagttgtagttgtagttgcaatattaatattaggt 

35 gcaccaacaatatttaat ttctctttagcattactaattggattattatctggtgtattc 
tcgtcaattttcattgctgtaccattatggggcatgcttaagaaacgacagtttaaaaag 
acaaaaaataataaattagtagtacacaaagagaagaaatctaacgatgaaaaaatctta 
gtttaa 

40 Sequence 1178 

MTYKNVVKNVNLGLDLQGGFEVLFQVDPLNKGDKI DKKALQATSQTLENRVNVLGVSEPK 
IQIEDPNRIRVQLAGIKDQAQARKLLSTQANLTIRDAEDHVLMSGSDIKQGSAKQEFKQE 
TNQPTVTFKVKSKDKFKKVTEKISKKRDNVMVVWLDFEKGDSYKKEAKKQQEGKKPKFIS 
AASVDQPINSSSVEISGGFNGKKGVEEAKQIAELLNAGSLPVDLKEIYSNSVGAQFGQDA 

45 LDKTMFASIVGIALI YLFMLGFYRLPGLVAIIALTTYIYLTLVAFNFISGVLTLPGLAAL 
VLGVGMAVDANIIMYERIKDELRIGRTLKQAYSKANKSSFLTIFDSNLTTVIAAAVLFFF 
GESSVKGFATMLLLG I LMI FVTAVFLSRGLLSLLVSSN FFKKQYWLFGVKKKDRHDI NEG 
KDVHDLKTSYERLNFVKLAKPLISLSILIVIIGLIIISI FKLNLGIDFSSGTRADIQSKN 
AITQAQVEKTVKSVGLEPDQIQINGSGNKNATVQFKKDLSREEDNKLSAKVKSEFGDNPQ 

50 INTVSPLIGQELAKNAVTALILASIGI II YVSLRFEWRMGLSSVLALLHDVFII IAI FSL 
FRLEVDLTFIAAVLTIVGYSINDTIVTFDRVRENLHKVKVITHTDQIDDIVNRSIRQTMT 
RSINTVLTVVVVVVAILILGAPTIFNFSLALLIGLLSGVFSSIFIAVPLWGMLKKRQFKK 
TKNNKLVVHKEKKSN DEKILV* 

55 Sequence 1179 

Cont i g_0 568 _po s_2 6 9 5_2 1 7 7 , 

is similar to (with p-value 2.0e-90) 

>gp: gp| D764 14 | D76414_l Staphylococcus aureus gene for histid 
yl-tRNA synthetase, ppGpp hydrolase, lytic enzyme, complete 
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cds. NID: g2580431. 

atggatttaaaacaatatgtttcagaagtaaaagattggccttcagcaggtgtaagcttt 
aaggatataacaactataatggataatggtgaagcttatggatatgctacggatcaaatt 
gttgaatacgcaaaggaaaaaaatatagatatagtagttggtcctgaagccagaggattc 
5 ataatagggtgtccagttgcttactcaatgggtattggatttgctccagtacgtaaagaa 
ggaaaactacctcgagaagttattcgttatgaatataatttagaatatggaactaacgta 
ttaactatgcataaagacgcgattaaaccaggacaacgagttttaatcactgatgattta 
ctagctacaggtggaactattgaagctgcaataaagcttgttgaacaattaggtggtata 
gttgtaggtattgcttttattattgaacttaaatatttgaatggaattgataaaataaaa 
10 gattatgatgtgatgagtttgatttcatatgatgaataa 

Sequence 1180 

MDLKQYVSEVKDWPSAGVSFKDITTIMDNGEAYGYATDQIVEYAKEKNIDIVVGPEARGF 
I IGCPVAYSMGIG FAPVRKEGKLPREVIRYEYNLEYGTNVLTMHKDAI KPGQRVLI TDDL 
15 LATGGTIEAAIKLVEQLGGIVVGIAFIIELKYLNGIDKIKDYDVMSLISYDE* 

Sequence 1181 
Cont ig_0 5 68_pos_0_l 656, 
is similar to {with p-value 0.0e+00) 
20 >gp:gp| D76414 I D76414_2 Staphylococcus aureus gene for histid 
yl-tRNA synthetase, ppGpp hydrolase, lytic enzyme, complete 
cds. NID: g2580431. 

gtgaataacgagtatccatatagcgcggatgaggtgctttataaagctaaatcatattta 
tcaacaagtgaatatgaatatgttctcaaaagttatcatatagcttatgaggcacataag 

25 ggtcaatttagaaaaaatggcttaccttatattatgcatcctattcaagttgcagggatt 
ttaacagagatgcgtttagacggacccactattgtcgctgggtttctacatgatgtgatt 
gaagatacttcttatacatttgaagatgttaaagatatgtttaatgaagaaattgcacga 
atagtagacggagtaactaaacttaaaaaggttaagtatcgctctaaggaagaacaacaa 
gcagaaaatcatcgtaaactatttattgctattgctaaagatgtacgcgtaattttagtg 

30 aagttagcagatcggcttcataatatgagaactttaaaggcaatgccaagagagaagcag 
gtaagaatctctaaggaaacct tagaaatctatgctcctttagcacatcgtctcggaatc 
aacacgataaagtgggaacttgaagatacagcgctgagatatattgacagtgtgcaatat 
ttccgcatcgttaatcttatgaagaaaaaacgtagtgaacgcgaagcttacattacaaat 
gcaatcaataaaattaaaaacgaaatgactaaaatgaatctttcgggcgaaattaacggt 

35 agaccaaaacatatttacagtatataccgcaaaatgataaaacaaaaaaagcaattcgat 
caaatatttgatttgcttgctatacgtattatagttaattcgataaatgattgttatgcg 
acacttggtttagttcatacattatggaaaccgatgcctggacgttttaaagattatata 
gctatgcctaagcaaaatatgtatcaatcactacataccactgtagttggacccaatggc 
gatcctttagaaatacaaattagaacgcacgaaatgcatgaaatcgctgaacatggtgtt 

40 gctgcacattgggcttataaagaaggtaagacagttaatcaaaaaacacaggattttcaa 
aataagcttaattggttaaaagaacttgctgaaaccgaccatacttctgcagatgcgcaa 
gaatttatggaatccttaaaatatgatttacagagcgataaggtatatgcatttactcca 
gctagtgatgttatagagttaccttatggtgcagtaccaattgattttgcttatgcaata 
cacagtgaagtaggaaataaaatgattggtgctaaggttaatggtaaaatcgtacctata 

45 gattatgttctacaaactggtgatattatagagattcgtacaagtaaacattcttacggt 
ccaagtagagactgg ttgaaaa ttgtaaaatcttctagtgccaaaagtaaaatcaaaagt 
ttctttaaaaaacaagatcgttcttctaatattgaaaaaggtaaatttatggtagaagcg 
gagattaaagaacaaggattccgtgttgaagatattctaactgagaaaaatttagaagtc 
gttaatgaaaaatatcattttgctaatgatgaagatttgtacgcagctgttggattcggt 

50 ggtgttacatcaatacaaatcgtcaataaattaAGA 

Sequence 1182 

VNNEYPYSADEVLYKAKSYLSTSEYEYVLKSYHIAYEAHKGQFRKNGLPYIMHPIQVAGI 
LTEMRLDGPTIVAGFLHDVIEDTSYTFEDVKDMFNEEIARIVDGVTKLKKVKYRSKEEQQ 
55 AENHRKLFIAIAKDVRVILVKLADRLHNMRTLKAMPREKQVRISKETLEI YAPLAHRLGI 
NTIKWELEDTALRYIDSVQYFRIVNLMKKKRSEREAYITNAINKIKNEMTKMNLSGEING 
RPKHI YSI YRKMIKQKKQFDQI FDLLAIRI IVNSINDCYATLGLVHTLWKPMPGRFKDYI 
AMPKQNMYQSLHTTWGPNGDPLEIQIRTHEMHEIAEHGVAAHWAYKEGKTVNQKTQDFQ 
NKLNWLKELAETDHTSADAQEFMESLKYDLQSDKVYAFTPASDVIELPYGAVPIDFAYAI 
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HSEVGNKMIGAKVNGKI VPI DYVLQTGDIIEIRTSKHSYGPSRDWLKIVKSSSAKSKIKS 
FFKKQDRSSNIEKGKFMVEAEIKEQGFRVEDILTEKNLEVVNEKYHFANDEDLYAAVGFG 
GVTSIQIVNKLR 

5 Sequence 1183 

Contig_0569_pos_4130_4 513, 

is similar to (with p-value 3.0e-45) 

>sp:sp|P52026|DPOl_BACST DNA POLYMERASE I (EC 2.7.7.7) (POL 
I). >gp: gp| L42111 I BACPOL_l Bacillus stearothermophilus DNA p 

10 olymerase I (pol) gene, complete cds . NID: g806280. 

atgtcagacattgttaaagatgcaaaagcacaaggttatgtggaaacactacttcatcgt 
cgtcgatacattcctgatataacaagtagaaacgttaatttaagaagttttgcagaaaga 
acagcaatgaatacacccatacaaggtagtgcagctgacataataaaattagcaatggtt 
aaattcagtgaaaagattaaagaaactaaatatcatgctaagttattattacaagttcat 

15 gatgaactcatatttgaaataccaaaatcagaagtagaagattttagtaaatttgtagaa 
gaaattatggaacaagcattagtgctcgatgtacctttaaaagtagattcgaattatggt 
gcaacatggtacgatgctaaataa 

Sequence 1184 

20 MSDIVKDAKAQGYVETLLHRRRYIPDITSRNVNLRSFAERTAMNTPIQGSAADIIKLAMV 
KFSEKIKETKYHAKLLLQVHDELI FEIPKSEVEDFSKFVEEIMEQALVLDVPLKVDSNYG 
ATWYDAK* 

Sequence 1185 
25 Con t i g_0 5 6 9 po s_b 4 2 6 6 028, 

is similar to (with p-value 2.0e-31) 

>sp:sp|Q55515| Y553_SYNY3 HYPOTHETICAL 22.5 KD PROTEIN SLR055 
3. >gp:gp| D64006I SYCSLLLH_95 Synechocystis sp. PCC6803 cowpl 
ete genome, 25/27, 3138604-3270709. NID: gl001291. 

30 gtgattgggataactggtggtattgccactggaaaatcaacagtttcagaattattaaca 
gcatatgggtttaaaatcgtagatgctgatattgcttcacgcgaagcagttaaaaaaggc 
tctaagggtcttgaacaagt taaagagatttttggggaagaagcaattgacgaaaatggt 
gagatgaatcgtcaatatgtaggagagatagtttttaatcatcctgacttacgcgaggct 
cttaatgaaatagttcatcctattgtaagagagataatggaacaagagaaaaacaattat 

35 ctagaacatggatatcatgtaattatggatatcccattgttgtacgaaaatgaactacaa 
gatactgtagatgaagtttgggtggtttatacatctgaaagtattcaaatcgatcgttta 
atggagaggaataatttatcattagaagatgctaaagcacgtgtttatagtcaaatatct 
atagataaaaaaagtaggatggcagatcatgtgatagataatctaggtgataaattagaa 
cttaaacagaatttacaaaaattacttgaagaagaagggtatattcaatcggagagtgaa 

40 tag 

Sequence 1186 

VIGITGGIATGKSTVSELLTAYGFKI VDADIASREAVKKGSKGLEQVKEIFGEEAIDENG 
EMNRQYVGEIVFNHPDLREALNEIVHPIVREIMEQEKNNYLEHGYHVIMDIPLLYENELQ 
45 DTVDEVWVVYTSESIQIDRLMERNNLSLEDAECARVYSQISIDKKSRMADHVIDNLGDKLE 
LKQNLQKLLEEEG Y I QS ES E * 

Sequence 1187 

Contig_0569_pos_6872_7 897, 

50 is similar to (with p-value 2.0e-96) 

>pir : pir IJS0164 IJS0164 glyceraldehyde-3-phosphate dehydrogen 
ase (EC 1.2.1.12) - Bacillus stearothermophilus >gp:gp|M2449 
3 I BACGAPDHA_1 B . stearothermophilus glyceraldehyde-3-phosphat 
e dehydrogenase gene, complete cds. NID: gl42951. 

55 atggcaacgaatattgcaattaacggtatgggtagaataggtagaatggtgttacgaata 
gcactaaataataaaaatttaaatgttaaagcgattaacgctagttatccacctgaaaca 
attgcacatttacttaattatgatacgacgcatggagtttatgataaaaaagttgaaccg 
attgaaagtggtattaaagtgaatggacatgaaattaaattactttctgatcgcaatcca 
gaaaatttaccatggaatgagatggatattgatgttgttatagaagcgacaggtaaattt 
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aatcacggagataaagcagttgctcatattaatgcaggtgctaaaaaggtattactcact 
ggaccgtctaaaggtggagacgttcaaatgattgttaaaggagtcaatgataatcaactt 
gatattgatacatacgatatttttagtaatgcatcttgtactactaattgtatcggacca 
gttgcaaaagtcctcaatgataaatttggaatcataaatggtctgatgacaactgttcat 
5 gcaataacaaatgatcaaaaaaatattgataatccacacaaagatttaagaagagcacgt 
tcttgtaatgaaagtattattccaacgtcaacaggtgctgctaaagcacttaaagaagta 
ttgcctgaagttgaaggtaaacttcatggaatggctttaagagtaccaacaaaaaatgtc 
tctctcgttgatttagttgttgatttagaacagaatgttacagttacacaagttaatgat 
gcatttaaaaatgccgatttatcaggtgttcttgatgttgaagaagctcctttagtttct 
10 gtagactttaacacaaatcctcattcagcaattattgattctcaatctacgatggttatg 
ggacaaaataaggtgaaagttatcgcttggtatgataatgaatggggttattcgaataga 
gttgttgaagtagctgacaaaattggacaattaattgatgataaagcaatggtaaaagcc 
atttaa 

15 Sequence 1188 

MATNIAINGMGRIGRMVLRIALNNKNLNVKAINASYPPETIAHLLNYDTTHGVYDKKVEP 
IESGIKVNGHEIKLLSDRNPENLPWNEMDIDVVIEATGKFNHGDKAVAHINAGAKKVLLT 
GPSKGGDVQMIVKGVNDNQLDIDTYDIFSNASCTTNCIGPVAKVLNDKFGIINGLMTTVH 
AITNDQKNIDNPHKDLRRARSCNESI IPTSTGAAKALKEVLPEVEGKLHGMALRVPTKNV 

20 SLVDLWDLEQNVTVTQVNDAFKNADLSGVLDVEEAPLVSVDFNTNPHSAIIDSQSTMVM 
GQNKVKVIAWYDNEWGYSNRVVEVADKIGQLIDDKAMVKAI* 

Sequence 1189 

Contig_0569_pos_8888_10258, 

25 is similar to (with p-value l-0e-51) 

>Sp:sp|P07 908|DNAB_BACSU REPLICATION INITIATION AND MEMBRANE 
ATTACHMENT PROTEIN. >pir : pir 1 B26580 | B26580 replication init 
iation protein - Bacillus subtilis >gp : gp | AF008220 | AF008220_ 
191 Bacillus subtilis rrnB-dnaB genomic region. NID: g229313 

30 5. >gp:gp|M15183|BACDNAB_2 B . subtilis dnaB gene, encoding th 
e replication initiation and membrane attachment protein, co 
mplete cds, clone pdnaB12. NID: gl42862. >gp : gp | Z991 18 j BSUBO 
015__164 Bacillus subtilis complete genome (section 15 of 21) 
: from 2795131 to 3013540. NID: g2635200. >gp: gp | Z75208 I BSZ7 

35 5208_1 B . subtilis genomic sequence 89009bp. NID: g!769994 . 

atggggttacaaacctatgaatatggtctaaaaccacaagatggatttgaggtgattaca 
catttcgaattcacctcacaacatttagatattttaaatcgactattcacccctttaatc 
ggagttgaatcaattggactctatcattttatgagtcaattcatagataaaagtcaacaa 
ctcgggttaacgcattatatattcatgaatgaactaaaaattaacttattagatttcagg 

40 gagcaaatggacaatttagaggctattggattgattaaaacatttgtaaggcatgaagaa 
aagtactctcactttgtttatgagttaattcagcctccaacagcctatcaattttttaat 
gatcctatgttatcagtatttttatttagtgaggttgataaaaaacgttatcaagcactt 
aaatcttatttcgaaaaagatgagaaagatttaagcaaatatcaacagacaactagaaaa 
tttacagaagtattcaacgtacctaaaaaggtcaatgtttctgatcaaattaatttaaag 

45 caaatcaaacactatgatggtatagatttatctaatgaaacttttgattttgaaatgttg 
agacagatgttgaaccatcattttattagtaatgaaattatcgataaagaagctaagaat 
ttgattatacaacttgcgacactttatggaattactgaagatggtatgaaaaatgttata 
ttaagttccattaccagtgcacaacaattatcttttgaagaaatgcgtaagaaagctaga 
acttattacctgattgaacatgataatcaattaccaaaattagagcatcaaacaaataaa 

50 attaacgatgaaaaaaaagatcgacaagcggaagatacaacaaatgattggttacaactg 
ctagatgaaacaagtccgattgatatgttagcaagttggtctgattcggaacctacacag 
tcgcaaaaaagtatgatagaagaattgattaaccgtgaaaaaatgaattttggtgtaatc 
aatatacttttacagtttgttatgttaaaagaagatatgaagttgccaaaatcttatatt 
tttgaaattgcttccaactggaagaaaattggtatttcaaatgccaaacaagcatatgaa 

55 tatgcattacaagttaatcaacctaaaaattacgaaacacattctaatgataaacgacag 
aacaatcgtggaagacaaaatcaatttttatccaaagaaaagacacctaaatggcttcaa 
aatagggacgatcaagaagaaaataaagaaataaatgatgacactctcgaagaagatcga 
caagcatttcttgaaaagttaaatcaaaagtggaaggaggaagataactaa 
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Sequence 1190 

MGLQTYEYGLKPQDGFEVITHFEFTSQHLDILNRLFTPLIGVESIGLYHFMSQFIDKSQQ 
LGLTHYIFMNELKINLLDFREQMDNLEAIGLIKTFVRHEEKYSHFVYELIQPPTAYQFFN 
DPMLSVFLFSEVDKKRYQALKSYFEKDEKDLSKYQQTTRKFTEVFNVPKKVNVSDQINLK 
5 QIKH YDGI DLSNET FDFEMLRQMLNHH FI SNEI I DKEAKNLI IQLATLYGITEDGMKNVI 
LSSITSAQQLSFEEMRKKARTYYLIEHDNQLPKLEHQTNKINDEKKDRQAEDTTNDWLQL 
LDETSPIDMLASWSDSEPTQSQKSMIEELINREKMNFGVINILLQFVMLKEDMKLPKSYI 
FEIASNWKKIGISNAKQAYEYALQVNQPKNYETHSNDKRQNNRGRQNQFLSKEKTPKWLQ 
NRDDQEENKEINDDTLEEDRQAFLEKLNQKWKEEDN* 

10 

Sequence 1191 

Contig_0569jpos_1027 9_11178, 

is similar to (with p-value 6.0e-63) 

>sp:splP06567 j DNAI_BACSU PRIMOSOMAL PROTEIN DNAI . >pir:pir|B 
15 • 24720 1 IQBS44 dnaA protein homolog, 44K - Bacillus subtilis > 
gp : gp I AF008220 I AF008220_192 Bacillus subtilis rrnB-dnaB geno 
mic region. NID: g2293135. >gp : gp | X04 963 | BSDNAB_1 Bacillus s 
ubtilis dnaB gene for initiation of chromosomal replication. 
NID: g39880. >gp : gp I Z99118 i BSUB0015_163 Bacillus subtilis c 
20 omplete genome (section 15 of 21): from 2795131 to 3013540. 
NID: g2635200. >gp:gp| Z75208 }BSZ75208_2 B . subtilis genomic s 
equence 89009bp. NID: gl769994. 

atgggcgattctcaaaatctagataaacgtatacaaaaaataaaacaaaatgtaatcaat 
gatactgacgttaaacattttcttgagaaaaatcgtagtaatataactaatgagatgata 

25 gacgaagatttaaatgttcttcaagagtataaagatcaacaaaaagt ttatgatggacat 
cgctatgatgattgtccgaattttgtaaaaggacatgttcctgaactatatattgaaaat 
gaaagaatcaaaattagatatctaccttgcccgtgtaaaattaaacatgatgaggaacga 
t ttgattcacaacttattacatct caeca t at gcaaagagatacact tcatgcaaagctc 
aaagatatttatatgaataatcgagagagacttgatgtagcaatggcagctgatcaaatc 

30 tgtacagcaattactaacgatgaaaaagtaaaggggttatatttatatggtccttttggt 
acaggaaaatcattcatattgggtgctattgcaaatcaacttaaatcgcaaaagatttca 
tcaacaattgtatatttaccagaatttattcgcactttaaaaggtggctttaaagacggt 
agttttgagaaaaaattacaacgtgtgcgagaagctaatattttgatgttagatgatatt 
ggcgcagaagaagtcacaccgtgggtaagagatgaagtgattggtcctttattacattat 

35 agaatggtacatgaacttcctacattttttagttctaactttaattatagtgagct tgag 
catcatctttcaataactagagatggcactgaaaagactaaagcagcacgaattattgaa 
agaattaagactttatcgacaccttattatttgactggtaaaaattttagaaacaattga 



40 Sequence 1192 

MGDSQNLDKRIQKIKQNVINDTDVKHFLEKNRSNITNEMIDEDLNVLQEYKDQQKVYDGH 
RYDDCPNFVKGHVPELYIENERIKIRYLPCPCKIKHDEERFDSQLITSHHMQRDTLHAKL 
KDIYMNNRERLDVAMAADQICTAITNDEKVKGLYLYGPFGTGKSFILGAIANQLKSQKIS 
STIVYLPEFIRTLKGGFKDGSFEKKLQRVREANILMLDDIGAEEVTPWVRDEVIGPLLHY 

45 RMVHELPTFFSSNFNYSELEHHLSITRDGTEKTKAARIIERIKTLSTPYYLTGKNFR13N* 



Sequence 1193 
Contig_0569_pos_0_389 / 
50 is similar to (with p-value 1.0e-19) 

>gp:gp|U56999|TPU56999_l Treponema pallidum methyl-accepting 
chemotaxis protein (mcp-1) gene, complete cds, and potentia 
1 regulatory molecule (pfoS/R) gene, partial cds. NID: gl354 
77 4. 

55 atgtcgacgataaaaaatatcgatggaccaaaggattttgtttttagagtgttatcaggg 
gtagcaattggaatagtagccggactcgttccaaatgcaattttgggagaaatttttaaa 
tact ttatgeaata tea tcctattttcaaaactttattaggggtcgttcaagccat ccaa 
tttacagtgccagcgcttattggagcattgatagctatgaagttcaatatgacaccttta 
gcaatagctgtagtagcaagtgcctcatatgttggtagtggtgcagctcaatttaaacaa 
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ggtacatgggtaattgcgggaattggggatttaattaatacgatgttaactgcatccatc 
gccgtgttacttattttattaatagaaga 

Sequence 1194 

5 MSTIKNIDGPKDFVFRVLSGVAIGIVAGLVPNAILGEIFKYFMQYHPIFKTLLGVVQAIQ 
FTVPALIGALIAMKFNMTPLAIAVVASASYVGSGAAQFKQGTWVIAGIGDLINTMLTASI 
AVLLILLIEX 

Sequence 1195 
10 Contig_0570j?os_734_14 95, 

is similar to (with p-value 4.0e-40) 

>sp:sp|P54717| YFIA_BACSU HYPOTHETICAL 29.3 KD PROTEIN IN GLV 
G-GLVBC INTERGENIC REGION . >gp : gp | Z99108 I BSUB0005_88 Bacillu 
s subtilis complete genome (section 5 of 21) : from 802821 to 

15 1011250. NID: g2633055. >gp : gp | D5054 3 | D5054 3_2 Bacillus sub 
tilis DNA for 76-degree region, complete cds . NID: gl486240. 
atgattttagatgaacgtgtaaactctaatttcgatcaattaaatgataatgatatacaa 
attgcacattatgttaatacacatatagatgtttgcaaaaatatgaaaatacaagattta 
gcctcacagacacatgcttcaaatgctacgattcatcgcttcactcgtaaactaggtttt 

20 gatggttatagtgactttaaatcctttttaaaatttgaagatagtaagaatcatcaactt 
ccttctgattctatggagcaatttaaacaagaaattgaaaatacattcaactatttagaa 
cgtattgattatcgtttattaactcacaaaatgcatcatgctacaacaatatacttatat 
ggtactggacgtgcacagatgaatgtcgctgaagaagcacaacgtatactgt tgact atg 
cataaaaatattatattgttacatgatgttcatgaactaaagatggtgttaaacaagaca 

25 attccagaagatttgtttt teat cat tt cactttctggcgaaacacatcaact taaagaa 
gtcacacaattgcttcaactgagacaaaaatattttatttccgtaacaacaatgaaagac 
aatacattggcacaacaagctgattacaatgtctatgtttcaagcaataccttctattta 
aacgatggtactgattattccagttttattagctatcacattttctttgaaacactacta 
agaaaatataacgaatataaagagaatcatgaattaacatag 

30 

Sequence 1196 

MILDERVNSNFDQLNDNDIQIAHYVNTHIDVCKNMKIQDLASQTHASNATIHRFTRKLGF 
DGYSDFKSFLKFEDSKNHQLPSDSMEQFKQEIENTFNYLERIDYRLLTHKMHHATTIYLY 
GTGRAQMNVAEEAQRILLTMHKNI ILLHDVHELKMVLNKTI PEDLFFI ISLSGETHQLKE 
35 VTQLLQLRQKYFISVTTMKDNTLAQQADYNVYVSSNTFYLNDGTDYSSFIS YHIFFETLL 
RKYNEYKENHELT* 

Sequence 1197 

Contig_0570_pos_2036_304 9, 

40 putative peptide of unknown function 

atggaacgattttgttgtgtaaatcaaattaactatattcaaatgaatccgttagaagcc 
aaatttaaaacgagcgctctaagatcatggaaaactgatcaggcagatgctcataagctt 
gcttgtttaggaccgacgcttaaacaaacagacagcttacctatacatgagttaatattc 
tttgaattaagagaacgcgtccgttttcatctagaaatcgagaatgaacaaaatcgactt 

45 aaatttcagatccttgaattactccatcaaacattccctggtttagaaagattgtttagt 
agtcgatattcaatcattgcactcaacatcgcagaaatctttactcatccagacatggtt 
cttgatatcgacaaggaggtactgattacacatatattcaattctacagataagggaatg 
tcaatggataaagctacaaaatatgcacttcaattaagggtgattgctcaagaaagctat 
cct agtgtcgatagacattcctttctagtcgaaaaattacgct tact tat tcaacaatta 

50 aaacaatctattcatcatctcaaacaattagatgatgccatgattcaattagcacaacaa 
ctcgattattttgaaaatattcattcgatacctggtattggtaagctaagcacagctatg 
attattggggagattggtgatattaagcgatttaaatcaaataaacaactcaatgctttt 
gttggcattgatatcaaacgatatcaatcaggtcatacacactgtagagataccatcaac 
aagcgtggtaataaaaaagcgagaaaacttttattttgggtgattatgaatataataaga 

55 gggcagcatcattatgacaatca tgtcgtcgattattactacaaactaagaaagcagcct 
aatgagaaacctcataagactgccatcattgcttgtataaatcgattattaaaaacaatt 
cattatcttgtaatgaatcataaattgtacgattatcaaatgtcaccacattag 

Sequence 1198 
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MERFCCVNQINYIQMNPLEAKFKTSALRSWKTDQADAHKLACLGPTLKQTDSLPIHELIF 
FELRERVRFHLEIENEQNRLKFQILELLHQTFPGLERLFSSRYSI IALNIAEI FTHPDMV 
LDIDKEVLITHI FNSTDKGMSMDKATKYALQLRVIAQESYPSVDRHSFLVEKLRLLIQQL 
KQS I HHLKQLDDAMIQLAQQLDYFENI HSI PGIGKLSTAMI I GE IGDI KRFKSNKQLNAF 
5 VGIDIKRYQSGHTHCRDTINKRGNKKARKLLFWVIMNIIRGQHHYDNHVVDYYYKLRKQP 
NEKPHKTAIIACINRLLKTIHYLVMNHKLYDYQMSPH* 

Sequence 1199 

Contig__0571_pos_14 4 3_2774 , 

10 is similar to (with p-value 0.0e+00) 

>sp:sp|P13375|G6PA_BACST GLUCOSE-6- PHOSPHATE ISOMERASE A (GP 
I A) (EC 5.3.1.9) (PHOSPHOGLUCOSE ISOMERASE A). >pir:pir|S15 
936 1 NUBSSA glucose-6-phosphate isomerase (EC 5.3.1.9) A - Ba 
cillus stearothermophilus >gp: gp | X16639 I BSPGIA_1 Bacillus st 

15 earothermophilus pgiA gene for phosphoglucoisomerase isoenzy 
me A (EC 5.3.1.9) . NID: g40045. 

atgactcacattcaattagactatggcaaaactttagaattttttgataagcatgaacta 
gatcagcaaaaggatattgttaaaactatccatcaaactattcataaaggtacaggagca 
ggtaatgactttttaggttggttagatttacctgttgattatgataaagaagaattttct 

20 agaatcgtcgaagcatctaaacgtatcaaatcaaattccgatgtacttgttgttatcggt 
attggaggttcatacttaggtgcacgtgctgcaatcgagatgcttacatcttcatttaga 
acaaatacggaataccctgaaattgtatttgtaggtaatcatttatcctcaagttataca 
aaagaattacttgattatttacaaggaaaagatttttcagttaacgttatttcaaaatca 
ggtactacgacagaaccagcagttgcatttagattatttaaacaattggttgaagaaaaa 

25 tatggaaaagatgaagctaagaaacgtatttttgcaacgacagataaatctaaaggtgca 
cttaaacaattagcagacaatgagggttatgagacgtttgttgtacctgatgatgtggga 
ggtcgttattctgttcttacagctgtaggattactaccaattgcaactgcaggtatcaat 
attgaatcaatcatgattggtgcggctaaggcacgtgaagagttatcttctgatgattta 
gatcaaaatatcgcatatcaatatgcaactattcgaaatattttatacagcaaaggttat 

30 actactgaaatgttaattaattacgaaccctctatgcagtatttcaacgaatggtggaaa 
caattatacggtgaatcagaagggaaagatttcaaaggtatttatccatcaagtgcgaat 
tacacaactgatttacattccttaggacaatatgttcaagagggccgtcgtttcttattc 
gagacagtggttaaggtcaaccatccaaaacatgatatcaaaattgaagaggatgcagat 
gatttagacggactgaactatcttgctggcaaatcaatcgatgaagtgaatactaaagca 

35 tttgaaggtacattacttgcacataccgatggtggcgttccaaatatcgttgtaaatatt 
cctcagttagatgaagaaacatttggatatgttgtttatttctttgaattagcttgtgca 
atgagtggatatcaattaggtgttaatccatttaatcaacctggagttgaagcctataaa 
caaaa tatgtttgcgctattaggtaaaccaggctttgaagataagaaaaaagaattagaa 
aatcgtttataa 

40 

Sequence 1200 

MTHIQLDYGKTLEFFDKHELDQQKDIVKTIHQTIHKGTGAGNDFLGWLDLPVDYDKEEFS 
RIVEASKRIKSNSDVLVVIGIGGSYLGARAAIEMLTSSFRTNTEYPEIVFVGNHLSSSYT 
KELLDYLQGKDFSVNVISKSGTTTEPAVAFRLFKQLVEEKYGKDEAKKRI FATTDKSKGA 
45 LKQLADNEGYETFVVPDDVGGRYSVLTAVGLLPIATAGINIESIMIGAAKAREELSSDDL 
DQNIAYQYATIRNILYSKGYTTEMLINYEPSMQYFNEWWKQLYGESEGKDFKGI YPSSAN 
YTTDLHSLGQYVQEGRRFIiFETVVKVNHPKHDI KI EEDADDLDGLN YLAGKSI DEVNTKA 
FEGTLLAHTDGGVPNIVVNIPQLDEETFGYVVYFFELACAMSGYQLGVNPFNQPGVEAYK 
QNMFALLGKPGFEDKKKELENRL* 

50 

Sequence 1201 

Cont ig J)57 l_pos_0_l 100 , 

is similar to (with p-value 0.0e+00) 

>sp:sp|P50986|ASSY_STRCL ARGININOSUCCINATE SYNTHASE (EC 6.3. 
55 4.5) (CITRULLINE--ASPARTATE LIGASE) . >pir : pir I S57659 | S57 659 
argininosuccinate synthase (EC 6.3.4.5) - Streptomyces clavu 
ligerus >gp: gp | Z4 9111 I SCARGGH_1 S . clavuligerus argG gene and 

argH gene (partial). NID: g886905. 
atgaaagataaaatcgttttagcatattcaggtggtttagatacaagcgttgcagttcaa 
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tggcttattgataaaggatatgatgtagttgcttgttgtcttgacgtaggcgaaggcaaa 
gatttagacgttgtatatcaaaaagctttagatatgggtgcagtcgaatgtcatattatt 
gatgcaactaaagaatttagtgatgattatgtaagttatgctattaaaggaaatttaatg 
tatgaaaatgcatatcctctagtttcagcattatcacgtccactcatcgcgaaaaaactg 
5 gttgaaattgctgaaaaaacaaattctattggtattgcgcatggatgtactggtaaaggt 
aatgatcaagtacgtttcgaagtggcaatcaaagctttaaatcctaagttaaaagcattt 
gcacctgttcgtgaatgggcttggagcagagaagaagaaattgattacgcaatcaaacat 
aatattcctgtttcaatcaattatgactcgccatactcaattgaccaaaacttatggggg 
agagctaatgaatgtggtattttagaagatccgtatgccgcacctccggaagatgcattt 

10 gatttaactacacctttagaagaaactccagacaatgcagacgaaattatccttacattt 
aaacaaggtattccagtacaagttgatggcaaagattatcaattagatgaccttattctt 
tacttgaatcaacttgctggcaaacacggtattggtagaatcgatcatgttgaaaacaga 
atggtcgggataaaatcgagagagatttatgaaacacctggtgcggaagttattttaaaa 
gcacacaaagcactagaaacaattacattaactaaagacgtagcgcactttaagcctgtc 

15 attgaaaaacaattttcagaacaaatatacaatggtttgtggttctcgccattaacagat 
agtttaaaactctttatcgatagtactcaacaatatgt tgagggagatgtgagaattaaa 
ttatttaaagggaacgctattgtcaatggcagacaatctccttacactttatacgatgaa 
aaattagctacttatacGGA 

20 Sequence 1202 

MKDKIVLAYSGGLDTSVAVQWLIDKGYDVVACCLDVGEGKDLDVVYQKALDMGAVECHII 
DATKEFSDDYVSYAIKGNLMYENAYPLVSALSRPLIAKKLVEIAEKTNSIGIAHGCTGKG 
NDQVRFEVAIKALNPKLKAFAPVREWAWSREEEIDYAIKHNIPVSINYDSPYSIDQNLWG 
RANECGILEDPYAAPPEDAFDLTTPLEETPDNADEI ILTFKQGI PVQVDGKDYQLDDLIL 

25 YLNQLAGKHGIGRIDHVENRMVGIKSREIYETPGAEVILKAHKALETITLTKDVAHFKPV 
IEKQFSEQIYNGLWFSPLTDSLKLFIDSTQQYVEGDVRIKLFKGNAIVNGRQSPYTLYDE 
KLATYTX 

Sequence 1203 
30 Cont ig_0572_pos_l 68 8_2 902 , 

is similar to (with p-value 0.0e+00) 

>sp:spjP39141|NUPC_BACSU PYRIMIDINE NUCLEOSIDE TRANSPORT PRO 
TEIN. >gp: gp| D45912 | D45912_25 Bacillus subtilis genome seque 
nee between the iol and hut operon, partial and complete cds 

35 . NID: gl408482. 

atgcatattgtgattgggatattaggaatcattttctttttagcactcgcagttttattt 
agttcagacagaaaaaatattcgctggcgatatgttggattgctattagtaattcaacta 
atatttgcatttatattacttaaaactaatttgggaatttcagttattgggagtatttca 
gatggttttaattatttattagctaaagcagcggtcggtgtcaattttgtatttggtggc 

40 tttaaatttattgatcctaaacaaccaccattct tctt tagcgttttgttacctat tgtt 
tttatttcagcattgattggtatattacaatatacacgaatacttccactaattattaac 
ttactgggctttttaatttcaaaaattaatggaatgggccgtttagaatcttacaatgcg 
gtcgcggcagcaattctaggacaatctgaagtctttatctcattaaaaaaacaattacct 
tacatacctaaacaacgcttatatacattaactgcttcagcgatgtcaacggtatcagca 

45 tcaattataggcgcttattttacacttattgaaccaaaatatgttgttactgcagtagtg 
cttaacttgtttggtggttttatcattgcatctatcattaatccttataaagtcaatgag 
gaagacgacaaattattaattgatgagaacgaaacaaaaaaacaatctttctttgaaatg 
cttggggagtatatactagatggatttaaagtagcagttattgtaggcgctatgctgata 
ggttatattgcaattattgctttattaaatggaatggtgagtggaatcttaagctttatg 

50 tctggtggtgctattcaatggaacttccaaacgcttattggatttatttttgcacctttc 
gctttcctaactggaataccgtggcaagatgcagttcaatctggttcagtaatggctaca 
aaattactatctaatgaatttgtagcaatgcaagatttaggtaaagcgactggattatcg 
gaacatgctaaaggaattacctctgtcttcttagtatcattcgcaaactttagttcaatt 
ggtattatttcaggagctattaaatcattgaatgatgaaaaaggtgacgttgttgctcgt 

55 ttcggaataaaattattatttggtgcaacacttgtttcgtttatatcagcggctattgca 
ggattctttatctaa 

Sequence 1204 

MHIVIGILGI IFFLALAVLFSSDRKNIRWRYVGLLLVIQLIFAFILLKTNLGISVIGSIS 
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DGFNYLLAKAAVGVNFVFGGFKFIDPKQPPFFFSVLLPIVFISALIGILQYTRILPLIIN 
LLGFLISKINGMGRLESYNAVAAAILGQSEVFISLKKQLPYIPKQRLYTLTASAMSTVSA 
SI IGAYFTLIEPKYVVTAVVLNLFGGFI IASI INPYKVNEEDDKLLI DENETKKQSFFEM 
LGEYILDGFKVAVIVGAMLIGYIAIIALLNGMVSGILSFMSGGAIQWNFQTLIGFIFAPF 
5 AFLTGIPWQDAVQSGSVMATKLLSNEFVAMQDLGKATGLSEHAKGITSVFLVSFANFSSI 
GIISGAIKSLNDEKGDVVARFGIKLLFGATLVSFISAAIAGFFI * 

Sequence 1205 
Contig_0572_pos_6336_6677, 
10 is similar to (with p-value 1.0e-28) 

>gp:gp|U35659jSBU35659_l Streptococcus bovis malic enzyme ge 
ne # complete cds . NID: gl006838. 

gtgtcagttgtatctagagctattggcacaccattaatacccgcaaaacttttgaacagt 
gctgcctttccttccattactggaatacttgcttctgctccaatattccctaaaccgaga 
15 acagcagttccatctgttacaacagcaactgtatttcctttaatagtgtactcatatact 
tttcttgaatcttcatggatttctttacaaggttctgcaacgccaggtgagtatgctagg 
cttaattgttgcttatttgtcactttaacatttggtgtaatttctagtttaccttggttc 
tctctatgcatttctaaagcgtcatctcttaaagacatttaa 

20 Sequence 1206 

VSVVSRAIGTPLI PAKLLNSAAFPSITGILASAPI FPKPRTAVPSVTTATVFPLI VYSYT 
FLESSWISLQGSATPGEYARLNCCLFVTLTFGVISSLPWFSLCISKASSLKDI * 

Sequence 1207 
25 Contig_0572_pos_11556_11170, 

putative peptide of unknown function 

atgcaattattaggtagacaacctcaaattggtgaaacagtcaatgatcaaattgccaaa 
catatttcaatacatcaacaaggaataaatgtcgacgtatctccactcattacaaatcat 
tacggtacactaagtaaagcagtatttgttggaattattgaggaaacgattcggcacgaa 
30 atgagaaagtataaaaaaggtaatgtcatgatagaaagtatgagtattatctatattaag 
act gtaccaattgaatc tact attgaagtacattatgaaatgttagatgttggccgatat 
tttgctaaattagaagttactatgattaataatggtgaaaaagttgctaatgcattagta 
atttgtcaaatgtttgatgggttttaa 

35 Sequence 1208 

MQLLGRQPQIGETVNDQIAKHISIHQQGINVDVSPLITNHYGTLSKAVFVGIIEETIRHE 
MRKYKKGNVMIESMSII YIKTVPIESTIEVHYEMLDVGRYFAKLEVTMINNGEKVANALV 
ICQMFDGF* 

40 Sequence 1209 

Cont ig_057 2_pos_l 113 6_101 98 , 

is similar to (with p-value 4.0e-31) 

>sp : spl P2274 6 I MGPA_MYCGE MGPA PROTEIN. >pir : pir I A64 221 | A64 22 
1 MgPa operon 29K protein homolog MG190 - Mycoplasma genital 

45 ium (SGC3) >gp : gp | U39698 | U39698_5 Mycoplasma genitalium sect 
ion 20 of 51 of the complete genome. NID: g3844782. 
atggaagtaaaaatgaatgaaataatggaagcattagaacaaagtgaattaattattatt 
cacagacatctaagaccagatccagacgcatatggttcacaattaggtttgaaatattac 
ttacaaaagaagtttccaaacaaacaaatttatgctgtaggagctaatgaagattctttg 

50 aaatttataggtttgatggacgaaattgacaatgatatatacaagaaagcgactgtagtt 
gtatgtgatacggcaaatgcgccacgaatagatgaccaacgttatgatacaggtaccaaa 
cttttgaaaattgatcatcatcctgctactgatcagtatggagatattaactatgttaat 
accaaagcttcttccactagtgaaataatttacgaattcatttcacatttcaatgatgaa 
catatcattgatgaacaagttgctagagtattatatcttggcatcgttggtgatactgga 

55 cgtt ttttatttaataatacaacgccacgaacaatgcaaattgctggaaaattacttaca 
tatccttttgatcacaaccaagaattaaacaaaatgtctgaaaaggatccaaaactatta 
ccatttcaaggatatatattgcaaaattttgatttaaatgataaaggattttgcaaagtt 
aaaataactaaagacatacttgaaaaatttcaaatacaacctaatgaagcgtct ttatt t 
gtaaatacaatcgcagatattcgaggattaaaaatatggatgtttggcgttgatgaagga 
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gatcaaattagatgtcggttgcgttctaaaggtcatattattattaatgatgtcgctaat 
acatttggtggtggtggacatccaaatgcatctggagtttcagtaaatagttgggagcaa 
ttcgagcaactcgccgaagctttaaacgacaagttataa 

5 Sequence 1210 

MEVKMNEIMEALEQSELI IIHRHLRPDPDAYGSQLGLKYYLQKKFPNKQIYAVGANEDSL 
KFIGLMDEI DNDI YKKAT VVVCDTANAPRI DDQRYDTGTKLLKIDHHPATDQYGDI NYVN 
TKASSTSEIIYEFISHFNDEHIIDEQVARVLYLGIVGDTGRPLFNNTTPRTMQIAGKLLT 
YPFDHNQELNKMSEKDPKLLPFQGYILQNFDLNDKGFCKVKITKDILEKFQIQPNEASLF 
10 VNTIADIRGLKIWMFGVDEGDQIRCRLRSKGHIIINDVANTFGGGGHPNASGVSVNSWEQ 
FEQLAEALN DKL * 

Sequence 1211 

Contig_0572_pos_1017 4_6977, 

15 is similar to (with p-value 0.0e+00) 

>sp: sp | PI 4 567 | DP3A_SALTY DNA POLYMERASE III, ALPHA CHAIN (EC 
2.7.7.7). >pir:pir | A4 5915 | A4 5915 DNA-directed DNA polymeras 
e {EC 2.7.7.7) III alpha chain - Salmonella typhimurium >gp: 
gp I M297 01 1 STYDNAE__1 S . typhimurium polymerase III polymerase 

20 subunit gene, complete cds . NID: gl53951. 

atggtagcacatttaaatattcatacttcttttgacctgttagattctagtttaagaatt 
gatgcattaatagataaagctaaaaaagaaggatatcgtgcgcttgcaataaccgataca 
aatgtattgtacggttatccaaagttttatgatgcttgtattgcagctcacatacatcca 
atctttggtatgactatatatttaacggatggtctctatactattgaaacggttgtt tta 

25 gcaaaaaataatcaaggactcaagtcattatatcaactttcttctgctataatgatgaga 
aataaagaagaagtgccaattgaatggctaaaaagatacgacgaacatttaattatcata 
tttaaagaggctgagttgtctcataagcaagttattgatgcttttgaaggtaagaaagaa 
ttatatttaaatcacaatagtaataatacattgactggcaaacgtgtatggatgcaatct 
gcaagatacttaaatgaagatgatgctgaaaccattccagcgttacatgccataagagat 

30 aatactaagttagatttaatacatgagaaagaaacacttgatgaacattttcctagtata 
gaagaacttcaaacactaaatcttagtgaagatatgattactaacgcgaatgaaattgaa 
gaatt atgccaagcagaaatt gcataccat caat ccctgttgccacaatttgtgacacct 
aatggtgaaacttcgaaagattatctttggacgatacttatacataggttacgagaatgg 
gaact taatgataaaacttat tt caatcggttgaaacatgaatataaaattat tact gat 

35 atgggtttcgaggattattttcttattgtaagtgatttgattcattttgctaaaacacat 
gaagtgatggttgggccaggtcgtggttcatcagcagggtcattagtaagttatttatta 
ggtat tact acta tagacccgt taaaatataatcttttatttgaaagatttct taatcct 
gaacgcgtaactatgccagatattgatattgattttgaagacacgagacgtgaaaaagta 
attaagtatgtacaagataaatatggtgaacatcatgtatcaggtattgtgacatttggg 

40 catctgttagctcgtgctgttgctagagatgtaggaagaataatgggatttgatgaaacg 
agtttaaatgagatttcaaaacttattccacataaattaggtataactcttgaagaagca 
taccaaaagccagagtttaaagcatttgttcatcgtaatcatagaaatgaacgttggttt 
gaagtgagtaaaaagttagagggattaccaagacatacgtctacgcatgctgcaggtatc 
attatcaatgatcaaccattattcaaatttgccccattaacaactggtgatacaggatta 

45 ttaacgcagtggactatgacagaagcggaacgtataggattattaaaaattgatttcttg 
ggattacgcaatctatcaattattcatcaaattattttacaagttaaaaaggatttaaat 
ataaatattgatatagaagctataccttatgatgataaaaaagtttttgatttattatca 
aacggtgacactacaggtatatttcaattggaatcagacggtgttagaagcgtattaaaa 
agattgcaacccgaacattttgaagatatcgtagctgtcacatcattatatagaccagga 

50 ccaatggaagaaataccaacttatataacccgtagacataatcctaaccaatttgcttat 
ttacatccagatttagaaccaatcttaaaaaacacatatggtgttatcatttatcaagaa 
caaataatgctaatagcaagtcaagttgctggttttagttatggtgaagcagatatttta 
agaagggcaatgagtaaaaagaatcgtgcaatcttagaaagtgagcgtcaacatttcatt 
gatggtgcaaaaaataacggttacgatgaacagataagtaagcaaatttttgatttaata 

55 cttaagtttgcagattatgggttcccacgtgcccatgctgttagttactcaaaaattgca 
tacattatgagctatttaaaagtgcactatcctcattatttttatgcaaatatcttgagt 
aatgtaataggaagtgaaaaaaagactgcagctatgattgacgaagctaagcaccaaaga 
attagcatcttgcctcccaatattaatcaaagtcattggtattataaggcaagtaataaa 
ggaatatatctgtctttaggtacaattaaaggaattggatatcaaagcgttaaattaatt 
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attgatgaacgtcagcagaatggaccttatagagatttctttgatttttcaagacgtata 
ccaaaaagggtgaaaaatagaaaattacttgagtctcttatcttagtaggcgcattcgac 
acttttggcaaaactagagcgacattattacaagcaattgatcaagtattagatttgaat 
tctgatgttgagcaagatgaaatgcttttcgatcttttaactcctaaacaatcgtatgaa 
5 gaaaaagaggaactacctgatcaattattaagtgattatgaaaaagaatacctaggattc 
tatattagtaaacatccagttgaaaagaaatttgaaaagaaacaatatttaggcatattt 
caattgtctaatggaagtcactaccaacctatacttgttcaatttgaccatatcaaacaa 
ataagaacgaagaatggtcaaaatatggcatttgtaacgatgaatgatggaagaacgatg 
atggatggagtgattttcccagataagtttaaaaaatacgaaacttctatttcaaaggaa 

10 cagatgtatatcgtattaggtaaatttgaaaagcgtaaccaacaaatgcaacttatcatc 
aatcaactttttgaagttgaagcgtatgagcaaacaaaattgtctaattcgaaaaaagtt 
attttacgtaatgtaacacatctagaaccacaatttgaacattcaaaagtagaatctaat 
gaacaacatgcattaaatatttatggttttgacgaaagtgcaaataagatgacaatgttg 
ggacaaattgaacgtcaacgtcaaaattttgatctattaatacaaacttattcgccagct 

15 gatattagatttatttaa 

Sequence 1212 

MVAHLN I HTS FDLLDSSLRI DALI DKAKKEGYRALAITDTNVLYGYPKFYDACI AAH I H P 
IFGMTIYLTDGLYTIETWLAKNNQGLKSLYQLSSAIMMRNKEEVPIEWLKRYDEHLIII 

20 FKEAELSHKQVIDAFEGKKELYLNHNSNNTLTGKRVWMQSARYLNEDDAETIPALHAIRD 
NTKLDLIHEKETLDEHFPSXEELQTLNLSEDMITNANEIEELCQAEIAYHQSLLPQFVTP 
NGETSKDYLWTILIHRLREWELNDKTYFNRLKHEYKIITDMGFEDYFLI VSDLIHFAKTH 
EVMVGPGRGSSAGSLVSYLLGITTIDPLKYNLLFERFLNPERVTMPDIDIDFEDTRREKV 
IKYVQDKYGEHHVSGIVTFGHLLARAVARDVGRIMGFDETSLNEISKLI PHKLGITLEEA 

25 YQKPEFKAFVHRNHRNERWFEVSKKLEGLPRHTSTHAAGI I INDQPLFKFAPLTTGDTGL 
LTQWTMTEAERI GLLKI DFLGLRNLS I I HQI I LQVKKDLNIN I DI EAI P YDDKKVFDLLS 
NGDTTGIFQLESDGVRSVLKRLQPEHFEDIVAVTSLYRPGPMEEIPTYITRRHNPNQFAY 
LHPDLEPILKNTYGVIIYQEQIMLIASQVAGFSYGEADILRRAMSKKNRAILESERQHFI 
DGAKNNGYDEQISKQIFDLILKFADYGFPRAHAVSYSKIAYIMSYLKVH YPHYFYANILS 

30 NVIGSEKKTAAMIDEAKHQRISILPPNINQSHWYYKASNKGIYLSLGTIKGIGYQSVKLI 
I DERQQNGPYRDFFDFSRRI PKRVKNRKLLESLI LVGAFDTFGKTRATLLQA1 DQVLDLN 
SDVEQDEMLFDLLTPKQSYEEKEELPDQLLSDYEKEYLGFYISKHPVEKKFEKKQYLGIF 
QLSNGSHYQPILVQFDHIKQIRTKNGQNb4AFVTMNDGRTMMDGVIFPDKFKKYETSISKE 
QMYIVLGKFEKRNQQMQLIINQLFEVEAYEQTKLSNSKKVILRNVTHLEPQFEHSKVESN 

35 EQHALNIYGFDESANKMTMLGQIERQRQNFDLLIQTYSPADIRFI* 

Sequence 1213 
Contig_0572_pos_6673_6230, 
is similar to (with p-value 6.0e-41) 
40 >gp: gp f U35659 | SBU35659_1 Streptococcus bovis malic enzyme ge 
ne, complete cds. NID: gl006838. 

atgtctttaagagatgacgctttagaaatgcatagagagaaccaaggtaaactagaaatt 
acaccaaatgttaaagtgacaaataagcaacaattaagcctagcatactcacctggcgtt 
gcagaaccttgtaaagaaatccatgaagattcaagaaaagtatatgagtacactat taaa 
45 ggaaatacagttgctgttgtaacagatggaactgctgttctcggtttagggaatattgga 
gcagaagcaagtattccagtaatggaaggaaaggcagcactgttcaaaagttttgcgggt 
attaatggtgtgccaatagctctagatacaactgacactcaagaaatcataaaaacagta 
aaacttattgcaccaaactatggtggaattaatcttgaagatatatcagctcccatttta 
tattggttcaaaacatggtattga 

50 

Sequence 1214 

MSLRDDALEMHRENQGKLEITPNVKVTNKQQLSLAYSPGVAEPCKEIHEDSRWYEYTIK 
GNTVAWTDGTAVLGLGNIGAEASIPVMEGKAALFKSFAGINGVPIALDTTDTQEIIKTV 
KLIAPNYGGINLEDISAPILYWFKTWY* 

55 

Sequence 1215 

Contig_0572_pos_6222_S281, 

is similar to (with p-value 4.0e-48) 

>gp:gp|AF068902 |AF068902_4 Streptococcus pneumoniae D-glutam 
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ic acid adding enzyme MurD (murD) , undecaprenyl-PP-MurNAc-pe 
ntapeptide-UDPGicNAc GlcNAc transferase (murG), cell divisio 
n protein DivIB (divIB) , orotidine-5 ' -decarboxylase PyrF (py 
rF) , and orotate phosphoribosyltransf erase PyrE (pyrE) genes 
5 , complete cds; and unknown genes. NID: g4009477. 

atgatagagtcacaactccctgatattcaatattatccaatatcaagcggtaaattacgt 
cgttatctatcttttgaaaatgcaaaagatgtctttaaagttttgaaaggaattttagat 
gcacgtaaaatacttaaaaaacaaaaaccagacttacttttttcaaaaggtggttttgtt 
agtgttccggtagttatagccgcacgttctttaaaaattccaactatcatacacgaatca 
10 gatttaactcctggattagctaataaaatttctttaaaatttgctaagaaaatatacaca 
acctttgaagatacacttacatatcttccaaaagataaagctgattttgttggggctact 
gtacgtgaggacttaaaacaagggaataaagaaagaggatatcaactcactgattttgat 
aaaaataaaaaagtgttattagtcatgggaggaagtttaggtagtaaaaaacttaataat 
atcattcgtcaaaatattgaggcacttctccacgattatcaaattatacacttaactgga 
15 aaaggacttgttgatgactcaatcaataaaaaaggttatgttcaatttgaatttgttaaa 
gacgacttaactgatttattagcaatcactga.tactgttgtaagtcgtgcaggttctaac 
gcaatttatgaatttttaacgctacgtataccgatgttactcatccccttaggacttgat 
. caatcaagaggagatcaaattgataatgctaaaaactttgaatctaagggttatggtcgt 
catattcctgaagatcaacttacagaagttaacttattgcaagaattaaatgatattgaa 
20 ttacatcgtgaatctat lattaaacaaatggaaacatatcaagagagttacacgaaagaa 
gatttatttgataaaattattcatgatgcattaaacaagtag 

Sequence 1216 

MIESQLPDIQYYPISSGKLRRYLSFENAKDVFPCVLKGILDARKILKKQKPDLLFSKGGFV 
SVPVVIAARSLKIPTI IHESDLTPGLANKISLKFAKKI YTTFEDTLTYLPKDKADFVGAT 
VREDLKQGNKERGYQLTDFDKNKKVLLVMGGSLGSKKLNNIIRQNIEALLHDYQI IHLTG 
KGLVDDSINKKGYVQFEFVKDDLTDLLAITDTVVSRAGSNAIYEFLTLRI PMLLIPLGLD 
QSRGDQIDNAKNFESKGYGRHIPEDQLTEVNLLQELNDIELHRESIIKQMETYQESYTKE 
DLFDKIIHDALNK* 

Sequence 1217 

Contig_0572_pos_5268_4 654 , 
is similar to (with p-value 3.0e-21) 

>pir :pir IS32217 j S32217 hypothetical protein 2 - Bacillus meg 
aterium >gp:gpj Z21972 | BMCTP4 50A_3 B.megaterium cytochrome P4 
50meg, ORF1 and ORF2 genes. NID: g288298. 

atgaatcgatggaaacgcatttcattgcttattgtttttacacttatttttggtataata 
gctttttttcatgaatcaaggcttggaaaatggatagataacgaagtatatgaatttatt 
tattcatctgaaagtttcattaccacatctattatgttaggtgtaacaaaaattggtgaa 
gtttgggcaatggttgcgctatccttattattagttgcttaccttatgctaaaacgcttc 
aagattgagacattattctttgtaatagtaatgagcttatctagtacactcaatccacta 
ttaaagaatatctttgatagggaacgtccaacattattgcgtttaattgacatttcaggc 
tttagttttccaagcggtcatgctatgggctcaacttcattctttggaagcgctatatat 
gtaataaaccgtcatgattcgggtatctctaaaggcgtgttaatcggtttatgcgcactt 
ttcattttattaatatcaacttctagagtgtatctaggcgttcattaccctacagatatt 
attgccggcattattggtggtgtattctgccttttactcagtactttattactacctaaa 
cagttaatagcttag 

Sequence 1218 

50 MNRWKRISLLIVFTLIFGI IAFFHESRLGKWIDNEVYEFI YSSESFITTSIMLGVTKIGE 
VWAMVALSLLLVAYLMLKRFKIETLFFVIVMSLSSTLNPLLKNIFDRERPTLLRLIDISG 
FSFPSGHAMGSTSFFGSAI YVINRHDSGISKGVLIGLCALFILLISTSRVYLGVHYPTDI 
IAGIIGGVFCLLLSTLLLPKQLIA* 

55 Sequence 1219 

Contig_0572_pos_4081_3068, 

putative peptide of unknown function 

atggaacgattttgttgtgtaaatcaaattaactatattcaaatgaatccgttagaagcc 
aaatttaaaacgagcgctctaagatcatggaaaactgatcaggcagatgctcataagctt . 
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gcttgtttaggaccgacgcttaaacaaacagacaacttacctatacatgagttaatattc 
tttgaattaagagaacgcgtccgttttcatctagaaatcgagaatgaacaaaatcgactt 
aaatttcagatccttgaattactccatcaaacattccctggtttagaaagattatttagt 
agtcgatattcaatcattgcactcaacatcgcagaaatctttactca tccagacatggtt 
5 cttgatatcgacaaggaggtactgattacacatatattcaattctacagataagggaatg 
tcaatggataaagctacaaaatatgcacttcaattaagggtgattgctcaagaaagctat 
cctaatgtcgatagacattcctttctagtcgaaaaattacgcttacttattcaacaatta 
aaacaatctattcatcatctcaaacaattagatgatgccatgattcaattagcacaacaa 
ctcgattattttgaaaatattcattcgatacctggtattggtaagctaagcacagctatg 

10 attattggggagattggtgatattaagcgatttaaatcaaataaacaactcaatgct ttt 
gttggcattgatatcaaacgatatcaatcaggtcatacacactgtagagataccatcaac 
aagcgtggtaataaaaaagcgagaaaacttttattttgggtgattatgaatataataaga 
gggcagcatcattatgacaatcatgtcgtcgattattactacaaactaagaaagcagcct 
aatgagaaacctcataagactgccatcattgcttgtataaatcgattattaaaaacaatt 

15 cattatcttgtaatgaatcataaattgtacgattatcaaatgtcaccacattag 

Sequence 1220 

MERFCCVNQINYIQMNPLEAKFKTSALRSWKTDQADAHKLACLGPTLKQTDNLPIHELIF 
FELRERVRFHLEIENEQNRLKFQILELLHQTFPGLERLFSSRYSIIALNIAEIFTHPDMV 
20 LDIDKEVLITHIFNSTDKGMSMDKATKYALQLRVIAQESYPNVDRHSFLVEKLRLLIQQ^ 
KQSIHHLKQLDDAMIQLAQQLDYFENIHSIPGIGKLSTAMIIGEIGDIKRFKSNKQLNAF 
VGIDIKRYQSGHTHCRDTINKRGNKKARKLLFWVIMNIIRGQHHYDNHVVDYYYKLRKQP 
NEKPHKTAI IACINRLLKTIHYLVMNHKLYDYQMSPH* 

25 Sequence 1221 

Contig_0572_pos_1540_1070, 

is similar to (with p-value 3.0e-36) 

>sp:sp|P37568|YACG_BACSU HYPOTHETICAL 17.7 KD PROTEIN IN LYS 
S-MECB INTERGENIC REGION. >gp:gp I D26185 I BAC180K_14 5 B. subti 

30 lis DNA, 180 kilobase region of replication origin. NID: g46 
7326. >gp:gp| Z99104 |BSUB0001_83 Bacillus subtilis complete g 
enome {section 1 of 21): from 1 to 213080. NID: g2632267 . 
gtgatatctatgcacaatatgtccgacatcatagaacaatacattaagcggttatttgaa 
gaagcagatgaagatgttgtagaaatacaacgcgctcatattgctcaacgtttcgattgt 

35 gttccttctcaacttaactatgttattaagacacgttttactaatgaacatggttatgaa 
atagaaagtaaacgtggtggcggtggttacattcgaatcactaaaattgaaaataaagat 
gctacaggttatattaatcacttactacaattaataggtccatctatttctcaacaacaa 
gggtattatgtcatagatggtttgttagataaaggtttgatcaatgaaagagaagctaaa 
atgatacagaccattattgatagagaaactttaaaaatggatgttgttgcacgcgatatt 

40 attagagctaatatcttaaaacgattactaccagttattaattattactag 

Sequence 1222 

VISMHNMSDIIEQYIKRLFEEADEDVVEIQRAHIAQRFDCVPSQLNYVIKTRFTNEHGYE 
IESKRGGGGYIRITKIENKDATGYINHLLQLIGPSISQQQGYYVIDGLLDKGLINEREAK 
45 MIQTII DRETLKMD VVARDI I RAN I LKRLLPVI NY Y * 

Sequence 1223 
Contig_0572_pos_10 64_48 6, 
is similar to (with p-value 7.0e-19) 
50 >gp:gp | U40604 |LMU40604_2 Listeria monocytogenes ClpC ATPase 
(mec) gene, complete cds . NID: gl314293. 

gtgaggtgtttaaaattgctttgtgaaaattgccat tttaatgaagcggaagttaaactt 
actgttaaaggtatagatagtacgcatgaaaaatgggtatgt tcagtatgtgcccaagga 
gaaaacccctggttacattctaacgatgataatacgtatcatacacaccaagacgatata 
55 gaagaagcatttgtagtgaaacagatacttcaacaccttgctgcaaaacatggtattaat 
tttcatgagatggcatttaaagaagaaaaaaaatgcccaacgtgtcagatgacacttaag 
gatattgcacatgttggtaagcttgggtgtgctgattgttatgctacgtttaaagaagac 
atcattgatatagttcaacgtgttcaaggtggtcaatttgaacatgtaggaaaaa caeca 
caa teat cgtataagaaacttgcaataaaaaagcaaattgaagaaaaatcaaaatat eta 
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aataaattgatagatggtcaagagtttgaagaggcagcgattgttcgtgatgaaattaaa 
gctttaaaaagtgagagcgaggtgtctcatgatgagtaa 

Sequence 1224 

5 VRCLKLLCENCH FNEAEVKLTVKGIDSTHEKWVCSVCAQGENPWLHSNDDNTYHTHQDDI 
EEAFWKQILQHLAAKHGINFHEMAFKEEKKCPTCQMTLKDIAHVGKLGCADCYATFKED 
IIDIVQRVQGGQFEHVGKTPQSSYKKLAIKKQIEEKSKYLNKLIDGQEFEEAAIVRDEIK 
ALKSESEVSHDE* 

10 Sequence 1225 

Con t ig_057 2_pos_0_4 5 4 , 

is similar to (with p-value 2.0e-27) 

>sp:sp| P37 570I YACI_BACSU HYPOTHETICAL 41.1 KD PROTEIN IN LYS 
S-MECB INTERGENIC REGION (ORFX) . >gp:gp| D26185 I BAC180K_147 B 
15 . subtilis DNA, 180 kilobase region of replication origin. N 
ID: g467326. >gp: gp | Z99104 | BSUB0001_85 Bacillus subtilis com 
plete genome (section 1 of 21): from 1 to 213080. NID: g2632 
267 . 

atgtctgaggagacacctgttattatttcttccagaattcgattagctagaaatcttgaa 
20 aaccatgtccacccacttatgttcccttcagagcaagaaggatatcgagtgataaatgaa 
gttcaagatgcgctttccaacttaactttaaatcgattagatacgatggatcaacaaagt 
aaaatgaaattggttgcgaaacatcttgtgagtcctgaactagtgaaacaacctgcttca 
gcagt aa t gt t aaat gat ga tgaatcggtaagtgtt at gataaacgaagaaga teat a ta 
cgaatacaggctctaggaactgatttatcgctaaaggatttatatcaacgcgcttctaaa 
25 attgatgatgaattagataaagcgttagacattagttatgatgagcatttaggatattta 
actacctgtcctactaatattggtacaggaatgc 

Sequence 1226 

MSEETPVI ISSRIRLARNLENHVHPLMFPSEQEGYRVINEVQDALSNLTLNRLDTMDQQS 
30 KMKLVAKHLVSPELVKQPASAVMLNDDESVSVMINEEDHI RIQALGTDLSLKDLYQRASK 
IDDELDKALDISYDEHLGYLTTCPTNIGTGMX 

Sequence 1227 

Contig_0573_pos_9024_8332, 

35 is similar to (with p-value 2.0e-91) 

>sp:sp|Q5372 6| PCRB_STAAU PCRB PROTEIN . >pir :pir | S3 99221 S3 992 
2 pcrB protein - Staphylococcus aureus >gp: gp | M63176 I STAPCRA 
_1 Staphylococcus aureus helicase required for T181 replicat 
ion (pcrA) gene, complete cds . NID: gl53060. 

40 atgtacgaca taaccaagtggaaacatatgtttaaattagatccggctaaatcaatttcg 
gatgaaaatttagaggcactgtgtatgtctaacactgatgcaataattattggtgggaca 
gatgatgtaacagaagataatgttattcatttaatgagtagagtaagacgttatccgtta 
ccacttgtcttagaagtttcgaatgtagaaagtgtgatgcctggttttgatttctatttt 
attccaacagtcatgaatagtaaggatacaaaatatcataacgaaattttactagaagcc 

45 cttaaaaaatatggacatgtgattaattttgatgaagtttttttcgagggatatgtcgtt 
ctaaacgcaaatagtaaagttgcaaaaattaccaaagcttatactcaattaggtatagaa 
gatgtcgaagcatatgcacaaatggcagaagaattatatcgatttccaatcatgtacgta 
gaatatagtggcacatatggagatgttgataaggttaaagcgattgcaaatatgcttcaa 
catactcaattattttatggcggtggtataacaaacattgacaaagctaacgaaatgtct 

50 aacattgcggataccattgttgtcggcgatattatatataacgacattaaaaaagcatta 
aaaactgtaaagataaaggagtctaataaatga 

Sequence 1228 

MYDITKWKHMFKLDPAKSI SDENLEALCMSNTDAI I IGGTDDVTEDNVIHLMSRVRRYPL 
55 PLVLEVSNVESVMPGFDFYFI PTVMNSKDTKYHNEILLEAliKKYGHVINFDEVFFEGY VV 
LNANSKVAKITKAYTQLGIEDVEAYAQMAEELYRFPIMYVEYSGTYGDVDKVKAIANMLQ 
HTQLFYGGGITNIDKANEMSNIADTIVVGDI IYNDIKKALKTVKIKESNK* 

Sequence 1229 
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Contig_0573_pos_8314_614 6, 

is similar to (with p-value 0.0e+00) 

>sp:sp|Q53727 | PCRA_STAAU ATP- DEPENDENT HELICASE PCRA (EC 3.6 
.1.-). >pir:pir | S39923 I S39923 DNA helicase pcrA - Staphyloco 
5 ecus aureus >gp:gp|M6317 6 |STAPCRA_2 Staphylococcus aureus he 
licase required for T181 replication (pcrA) gene, complete c 
ds. NID: gl53060. 

atgaattcagagcaaagtgaagcggttagaacaacagaaggcccattgcttattatggca 
ggtgctggatcaggaaagacacgtgtgttaacacatcgcattgcgtatttattagatgaa 

10 aaagatgtatcaccttataatattttagctattacgtttacaaataaagcagctaaagaa 
atgaaggcgcgtgtcgaacatcttgtgggagaagaagcgcaagtgatttggatgtccact 
tt tcactctatgtgtgtaagaattctgagaagagatgctgatcgtattggcattgaaaga 
aatttcactatcattgatcctaccgatcaaaaatcagtgattaaagatgtattgaaaagt 
gaaaatatagacagtaagcgatttgagccacgtatgtttattggtgcaattagcaatttg 

15 aaaaatgaattaaaaacacctgaggatgctcaaaaagaggcgaatgattttcactctcaa 
atggttgcaacggtttacaaaggttatcaaagacagttatcacgtaatgaagcactcgac 
tttgatgatttaattatgacaactattaatttatttgaacgtgtacccgaaactctagaa 
tactatcaaaataaatttcaatatatacatgtagatgagtatcaagataccaataaagca 
caatatacct tagtaaaactattagcaaacaaatttaaaaatttatgtgttgt tggtgat 

20 tctgaccaatctatttatggttggagaggagctgatatacaaaatattttatcttttgaa 
gaggactatcctgaggcaaagacaattttcctcgaacagaactatcgttcaactaagaat 
attttaaatgctgcaaatgaagttataaaacataattctgaacgtaaacctaaaggtcta 
tggactgcaaattctggaggagacaaaattcagtattatgaagctatgactgaaagagat 
gaagcagaatacgttgttaaagaaataatgaagcatcaacgcagtggtaaaaaatatagt 

25 gaaatggctatattatatagaacaaatgcccaatcacgtgtacttgaggaaacatttatg 
aaatcaaatattccttatacaatggttgggggtcaaaagttctatgaccgtaaagaaatt 
aaagatttacttagttatttaagagttattgctaatagcaatgatgatattagtttgcaa 
cgtattattaacgtgcctaaacgtggtat tggaccttcatctgttgaaaaaatccaaacc 
tatgcacttcaaaataatataagtatgtttgacgcattggctgaggtagattttataggt 

30 ctctctaaaaaggtaactcaagaatgtatcagtttttatgaaatgattcaaaatttaatc 
aaagaacaagaatttctcgaaattagtgaaatcgtagatgaagtactacaaaaatcaggc 
tatagagacatgcttgatcgagaacaaagtattgaatcacgaagtcgattagaaaactta 
gatgaatttatgtctgtacctaaagattatgaggaaaatactcctttagaggaacaatca 
cttattaattttctaacagatttatcattagttgctgatattgacgaagcagatacacag 

35 aatggtgtaacattgatgacaatgcattcagcaaaaggtcttgaatttcctatagttttt 
attatgggaatggaggagtcgttgttcccacatatcagagcaataaaaagtgaagatgat 
catgaaatggaagaggaacgtcgtatttgttatgtagcaattacacgagcagaagagttg 
ctttatatcacaaatgcaacgaccagaatgttgtttggtcgttctcaatccaatatgcca 
tctcgatttttaaaagaaatcccagaagacctacttgatagtcataccggtcaaaaaaga 

40 caaactatatctcccaaatctcaacctaaaagaggttttagtaagcgtactacatcaact 
aaaaaacaagtttcatcatctgattggaaagtaggagataaagttatgcataaagcatgg 
ggtgaagggatggttagtaacgtgaatgaaaaaaatggatctgtagagttggatattata 
tttaaatcagaaggtccaaaacgattattagctcagttcgcaccaataacaaagaaggag 
gactcatag 

45 

Sequence 1230 

MNSEQSEAVRTTEGPLLIMAGAGSGKTRVLTHRIAYLLDEKDVSPYNILAITFTNKAAKE 
MKARVEHLVGEEAQVIWMSTFHSMCVRILRRDADRIGIERNFTIIDPTDQKSVIKDVLKS 
ENIDSKRFEPRMFIGAISNLKNELKTPEDAQKEANDFHSQMVATVYKGYQRQLSRNEALD 

50 FDDLIMTTINLFERVPETLEYYQNKFQYIHVDEYQDTNKAQYTLVKLLANKFKNLCVVGD 
SDQSI YGWRGADIQNILSFEEDYPEAKTI FLEQNYRSTKNILNAANEVIKHNSRRKPKGL 
WTANSGGDKIQYYEAMTERDEAEYWKEIMKHQRSGKKYSEMAILYRTNAQSRVLEETFM 
KSNIPYTMVGGQKFYDRKEIKDLLSYLRVIANSNDDISLQRIINVPKRGIGPSSVEKIQT 
YALQNNISMFDALAEVDFIGLSKKVTQECISFYEMIQNLIKEQEFLEISEIVDEVLQKSG 

55 YRDMLDREQSIESRSRLENLDEFMSVPKDYEENTPLEEQSLINFLTDLSLVADIDEADTQ 
NGVTLMTMHSAKGLEFPIVFIMGMEESLFPHIRAIKSEDDHEMEEERRICYVAITPAEEL 
LYITNATTRMLFGRSQSNMPSRFLKEIPEDLLDSHTGQKRQTISPKSQPKRGFSKRTTST 
KKQVSSSDWKVGDKVMHKAWGEGMVSNVNEKNGSVELDII FKSEGPKRLLAQFAPITKKE 
DS* 
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Sequence 1231 
Contig_0573jpos_614 2_414 5, 
is similar to (with p-value 0.0e+00) 
5 >gp:gp| AJ01167 6 I BST01167 6_1 Bacillus stearothermophilus lig 
gene. NID: g3688228. 

atgcaagatgttaaaaagcgtgtggaaaaattacatgacttattgaatcaatatagttat 
gaatattatgtacaagataatccctcagtgcctgacagtgagtatgataagttattacat 
gagctgattgaaattgaagaaaaatatccagaattcaaatcgacagactctccaacagtg 

10 cgtgtgggtggcgaagctcagtcttcttttgaaaaagtaaatcacgacacgcctatgtta 
agtttaggtaatgcttttaatgaagaagatttaagaaaatttgatcaacgtattcgtgat 
agtattggtaaggtcgaatacatgtgtgaacttaaaatagatggtttggctgtttcgctc 
aaatatgaaaatggtcgttttgttcaaggacttacacgtggtgatggtacgacaggtgag 
gatatcactgaaaatctaagaactatacatgctataccactaaaaattaaagaacctctc 

15 aattttgaggtccgtggggaagcttatatgccacgtcgttcattcattcatttgaataat 
gaaaaagaacaaaatggtgaacaaccttttgcaaatccacgaaacgctgctgcaggctct 
ttaagacaacttgactctaaactagctgcgaaaagaaagttaagcgtcttcttatatagt 
gtgaatgacctaaccgagtttaatgcaacaacacaaagtgaagcgctagaggaattggac 
caattaggttttaaaactaaccaagaacgtgaacgagtatcagatattgagggcgtactt 

20 aattatatagagaaatggacaagcaaaagaggatctttatcttacgatattgatggtatt 
gttataaaagttaacgatttatctcaacaagaggaaatgggttatacgcaaaaatctcca 
agatgggcgattgcttataaatttccagctgaagaagttattacaaaattattggatatt 
gagctaagtattgggcgtacgggtgttgtgacaccaactgcaattctagaacctgtaaaa 
gtagctggtactacagtttcaagagcctcacttcataatgaagatttaatacatgaaaga 

25 gatatacgtatcggagatagtgttgttattaaaaaagccggggacatcatccctgaagtt 
gtaaaaagtattttagatagacgacctaacgaatcggaaatttatcatatgccaacacat 
tgtcctagttgtggacatgaattagttcgtattgaaggagaagttgctttacgttgtatt 
aatccaaaatgtcaggcacagcttattgaaggacttatacatttcgtttcaagacaagcg 
atgaatatagatggtttaggtactaaaattattcatcagctatacgaaaatcagttaatc 

30 aaagatgtcgcagatattt tctatttgaaagaagaagatttattaccattagagcgaatg 
ggaaagaagaaagttgataatcttttattagcgatagaaaaatctaaagaacagtcatta 
gagcatttattatttggacttggtattagacatttaggtgtaaaagctagtcaagtactt 
gctgagcgatatgaaacgatggatcaactttttaaagtaactgaaagtgaattaattgaa 
attcaagatattggagataaacttgcacaatctgttgtaacatatctcgaaaatagtgat 

35 attcgttcattaattgaaaaattaagtaataaaaatgttaatatgtcttataaaggaatt 
aaaacaactgaaatcgaaggtcatcctgattttagtgggaaaacaattgtattaacaggg 
aaactcgagcaaatgacgagaaatgaagcatctgaatggttgaaaatgcaaggtgctaaa 
gttacaagcagcgtgactaaaagtactgatattgtcatagctggagcagatgcagggtct 
aaattagccaaagctgagaagtatggtactgaaatttggactgaagcagcatttattgaa 

40 aaacaaaatggaatctaa 

Sequence 1232 

MQDVKKRVEKLHDLLNQYSYEYYVQDNPSVPDSEYDKLLHELIEIEEKYPEFKSTDSPTV 
RVGGEAQSSFEKVNHDTPMLSLGNAFNEEDLRKFDQRIRDSIGKVEYMCELKIDGLAVSL 

45 KYENGRFVQGLTRG DGTTGEDITENLRTIHAIPLKIKEPLNFEVRGEAYMPRRSFIHLNN 
EKEQNGEQPFANPRN7\AAGSLRQLDSKLAAKRKLSVFLYSVNDLTEFNATTQSEALEELD 
QLGFKTNQERERVSDIEGVLNYIEKWTSKRGSLSYDIDGIVIKVNDLSQQEEMGYTQKSF 
RWAIAYKFPAEEVITKLLDIELSIGRTGVVTPTAILEPVKVAGTTVSRASLHNEDLIHER 
DIRIGDSVVIKKAGDIIPEVVKSILDRRPNESEI YHMPTHCPSCGHELVRIEGEVALRCI 

50 NPKCQAQLIEGLIHFVSRQAMNIDGLGTKIIHQLYENQLIKDVADIFYLKEEDLLPLERM 
GKKKVDNLLLAIEKSKEQSLEHLLFGLGIRHLGVKASQVLAERYETMDQLFKVTESELIE 
IQDIGDKLAQSVVTYLENSDIRSLIEKLSNKNVNMSYKGIKTTEIEGHPDFSGKTIVLTG 
KLEQMTRNEASEWLKMQGAKVTSSVTKSTDIVIAGADAGSKLAKAEKYGTEIWTE7\AFIE 
KQNGI* 

55 

Sequence 1233 

Contig_0573_pos_373S_3064 , 

putative peptide of unknown function 

atgagcgaaaaagaaaagaaaagcaaaaatgctaatgagaatcttggactcaatcca tat 



309 



WO 01/34809 



PCT/USOO/30782 



tctacagataagggaatgtcaatggataaagctacaaaatatgcacttcaattaagggtg 
attgctcaagaaagctatcctaatgtcgatagacattcctttctagtcgaaaaattacgc 
t t act tattcaacaattaaaacaatctat teat catctcaaacaattagatgatgccatg 
attcaattagcacaacaactcgattattttgaaaatattcattcgatacctggtattggt 
5 aagctaagcacagctatgattattggggagattggtgatattaagcgatttaaatcaaat 
aaacaactcaatgcttttgttggcattgatatcaaacgatatcaatcaggtcatacacac 
tgtagagataccatcaacaagcgtggtaataaaaaagcgagaaaacttttattttgggtg 
attatgaatataataagagggcagcatcattatgacaatcatgtcgtcgattattactac 
* aaactaagaaagcagcctaatgagaaacctcataagactgccatcattgcttgtataaat 
10 cgattattaaaaacaattcattatcttgtaatgaatcataaattgtacgattatcaaatg 
tcaccacattag 

Sequence 1234 

MSEKEKKSKNANENLGLNPYSTDKGMSMDKATKYALQLRVIAQESYPNVDRHSFLVEKLR 
15 LLIQQLKQSIHHLKQLDDAMIQLAQQLDYFENIHSIPGIGKLSTAMIIGEIGDIKRFKSN 
KQLNAFVGIDIKRYQSGHTHCRDTINKRGNKKARKLLFWVIMNIIRGQHHYDNH WDYYY 
KLRKQPNEKPHKTAIIACINRLLKTIHYLVMNHKLYDYQMSPH* 

Sequence 1235 
20 Contig_0573_pos_2231_1056, 

is similar to (with p-value 2.0e-41) 

>gp:gp|U09991 ISVU09991_1 Streptomyces venezuelae ISP5230 chl 
oramphenicol resistance protein (cmlv) and chloramphenicol 
phosphotransferase genes, complete cds . NID: g498886. 

25 atgaaagataataaaatgttgttcattatttttatgataggaacatttacagtaggaatg 
gctgaatatgtagtgacaggat tact tacacaaatcgctgacgat atgaaggttt eta tt 
tcgagtgcaggtttattaattagtgtttatgctattagtgttgcattgatagggccttta 
atgcgaatcataacattgaaagttcacgcccaccgtctgttaccgattttagttgcgatt 
tttataataagtaatttagtgggaatgttagcaccgaattttaatgtattgttattatca 

30 agactcatgtctgcggcaatgcatgcgccattcttcggtgtgtgtatgagtgttgctgcg 
acagtcgcacctcctgctaaaaaaacacaggccattgcacttgttcaggcaggtttaact 
attgc.tgtaatgttaggtgtaccattcggatcatttttaggtggctttgcaaattggaga 
gttgtttttggat teat gat t gtgttggcaatcattactatgttaggaatgattaaattt 
gttccaaatgtttctttaagtgcagaagcaaatattagcaaagaattaacagtgtttaag 

35 aa tccacacat t ttaa t tgtgattgcaat tat tgtgtttggtt act ctggtgtgtt tact 
acttatacatttatggagccaatgatacgagatttttctccatttaaaattgtaggttta 
actgtttgtttatttatgtttggtctaggcggtgtgatagggaatttaattactggtaat 
gtaccggaagataaattaacaaaaaatttataccttacatttcttttactatttgtaaca 
atcatactatttgttactgttattcaaaattcaatattagcattaatcatttgcttctta 

40 ttcggttttggtacatttggtacaacaccgttacttaatagcaaaattatcttaagtgca 
aaagaagcaccacttcttgcaagtacgttagctgcttctattttcaatgt tgctaatttt 
cttggtgcaatcattggatctatattattatcaatagggttaccttacat tcaaattact 
ttgatatctggtgggattatagtgttgggtatgcttcttaatcttgttaatcaactttat 
gaaaagaaacatatcacatttaatgaatattcatga 

45 

Sequence 1236 

MKDNKMLFIIFMIGT FTVGMAEYVVTGLLTQIADDMKVSISSAGLLISVYAISVALIGPL 
MRIITLKVHAHRLLPILVAIFIISNLVGMLAPNFNVLLLSRLMSAAMHAPFFGVCMSVAA 
T VAPPAKKTQA I ALVQAGLT I AVMLGV P FGS FLGG FANWRVV FG FMI VLAI I TMLGM I KF 
50 VPNVSLSAEANISKELTVFKNPHILIVIAI IVFGYSGVFTTYTFMEPMIRDFSPFKIVGL 
TVCLFMFGLGGVIGNLITGNVPEDKLTKNLYLTFLLLFVTIILFVTVIQNSILALIICFL 
FGFGTFGTTPLLNSKIILSAKEAPLLASTLAASIFNVANFLGAIIGSILLSIGLPYIQIT 
L I SGG 1 1 VLGMLLNLVNQL Y E KKH ITFNEYS* 

55 Sequence 1237 

Contig_0575_pos_141_4 85, 

putative peptide of unknown function 

gtgacaaaccggaggaaggtggggatgacgtcaaatcatcatgccccttatgatttgggc 
tacacacgtgctacaatggacaatacaaagggcagcgaaaccgcgaggtcaagcaaatcc 
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cataaagttgttctcagttcggattgtagtctgcaactcgactatatgaagctggaatcg 
ctagtaatcgtagatcagcatgctacggtgaatacgttcccgggtcttgtacacaccgcc 
cgtcacaccacgagagtttgtaacacccgaagccggtggagtaaccatttggagctagcc 
gtcgaaggtgggacaaatgattggggtgaagtcgtaacaaggtag 

5 

Sequence 1238 

VTNRRKVGMTSNHHAPYDLGYTRATMDNTKGSETARSSKSHKVVLSSDCSLQLDYMKLES 
LVIVDQHATVNTFPGLVHTARHTTRVCNTRSRWSNHLELAVEGGTNDWGEWTR* 

10 Sequence 1239 

Cont ig_057 5_pos_4 020 0 , 

is similar to (with p-value 1.0e-63) 

>sp:sp|P004 97|PURl_BACSU AMIDOPHOSPHORIBOSYLTRANSFERASE PREC 
URSOR (EC 2.4.2.14) (GLUTAMINE PHOSPHORIBOSYLPYROPHOSPHATE A 

15 MI DOTRANSFERASE) (ATASE) . >pir : pir I A00582 I XQBS amidophosphor 
ibosyltransf erase (EC 2.4.2.14) - Bacillus subtilis >gp:gp|J 
02732 | BACPURF_7 B. subtilis pur operon encoding purine biosyn 
thesis enzymes, 12 genes. NID : gl43363. >gp : gp I Z99107 | BSUB0O 
04_97 Bacillus subtilis complete genome (section 4 of 21) : f 

20 rom 600701 to 813890. NID: g2632866. 

atggagaacggtgcttatattttagcaagtgaaacatgtgcgattgatgttttaggtgct 
gaatttatacaagatattcatgcaggtgagtatgttgttattacggatgaaggtatagaa 
gttaagacttacacacgacaaacaacaactgcaatttcagctatggaatatatttatttt 
gcgagacctgattcaacgattgcaggaaaaaatgttcatgcggtacgaaaggcatcaggt 

25 aaacggttagcacaggaaaacccagcaaaagcagatatggtaataggcgtacctaattca 
tcattatctgcagcaagtggttatgctgaagaaat aggcctaccatatgaaatgggacta 
gttaaaaatcaatatgttgctcgaacttttatacaacctactcaggaattaagagagcaa 
ggtgtacgtgtgaaactgtcggctgttaaggatattgttgatggtaaagatatcgtactt 
gtagatgattcgattgttcgaggtacaacgattaaacgcatagttaaaatgcttaaggat 

30 tcaggagctaaccgcattcacgtaagaattgcttctccc 

Sequence 124 0 

MENGAYILASETCAI DVLGAEFIQDI HAGEYVVITDEGIEVKTYTRQTTTAISAMEYI YF 
ARPDSTIAGKNVHAVRKASGKRLAQENPAKADMVIGVPNSSLSAASGYAEEIGLPYEMGL 
35 VKNQYVARTFIQPTQELREQGVRVKLSAVKDIVDGKDIVLVDDSIVRGTTIKRIVKMLKD 
SGANRIHVRIASP 

Sequence 1241 

Contig_057 7_pos_8 4 4 1_90 4 3 , 

40 putative peptide of unknown function 

atgaaaaatgtttctaaagctttgatttggtttgttataagcttcatcatctttcacgca 
atattatttgtgatgtggggagaacatcaagaatactggtatttatatactggcattatg 
ttaatagctggaataagttatgttttttaccaaagagacattgcatctaaacgattatta 
acttccataggcatgggtataataacgagtgtcgcacttattattatacaattaattttt 

45 tcacttatttcatcagaattatcatacgcatctttaatcaaagaattatcacgaacgggt 
gtctactttaaatggcaaatgctcgttactttattatttgtgataccttgtcatgaatta 
tatatgagaactgttttacaaaaggaattaataaaatataacttaccgaaatgggctagc 
attttaattgttgcaatatgttcaagttcattatttatatacttagataattggtggatt 
gtattctttatttttgtagctcaattcattctatctcttagctatgaatatacgagacgt 

50 attgctacgactacaattggtcaaattgtggctatcattttattattgatattccacgga 
taa 

Sequence 1242 

MKNVSKALIWFVISFII FHAILFVMWGEHQEYWYLYTGIMLIAGISYVFYQRDIASKRLL 
55 TSIGMGIITSVALII IQLI FSLISSELSYASLIKELSRTGVYFKWQMLVTLLFVIPCHEL 
YMRTVLQKELIKYNLPKWASILIVAICSSSLFIYLDNWWIVFFIFVAQFILSLSYEYTRR 
IATTTIGQIVAIILLLIFHG* 

Sequence 1243 
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Contig_0577_pos_974 6_10939, 

is similar to (with p-value 2.0e-65) 

>sp: sp I P37 487 \ YYBQ_BACSU HYPOTHETICAL 34.0 KD PROTEIN IN COT 
- F-TETB INTERGENIC REGION. >gp : gp | D26185 | BAC180K_14 B. subtil 
5 is DNA, 180 kilobase region of replication origin. NID: g467 
326. >gp:gpl Z99124 | BSUB0021_160 Bacillus subtilis complete g 
enome (section 21 of 21): from 3999281 to 4214814. NID: g263 
6442. 

atggaccatagttccgcttcgaaaaaattaattaaagatatagagcaaaatcagtatgta 

10 acagtaaaacatttatctcatgatgatttttatattgatgatttggtaaaaaagaaagaa 
gtcattgcaagccttgaaatacctaaggatttctcaaaacaccttaaagataatgattta 
aataagactcttccattatatagcagagatgattttataggacatattgctatggaaata 
atcagtcgatcattatacgaacaacaaatccctaatattattcatgaacatcttgatgat 
atgaaacaaccacaatccttagataaagtgaaacaatcttattattcgcttacacctcaa 

15 tctaaaataaaaagtgtagctatcaataaacatgctcatcaatccatttcaattggcatt 
gtatttgttgtcgtcatctttgtaagtgttatccaaatcctattacatcaacgtcttaaa 
cagaacgcacctctcgaaagattatatttggtaccttatagtcaacttaaactatacttg 
acttatatcagtgtacacagaatttctaactttgaaacagctggccctttatactataga 
gcagaaccagttggttgtagtgcaacaattttatataaaatgtataaagaacgtggattt 

20 gaaattaaaccagaaatcgctggacttatgatttcagctataatttctgatagtttatta 
tttaaatcacctacctgcacaaaagaagatgtagatgctgctcaagcacttaaagatatt 
gcaaatgttgatttagaagcatatggtttagaaatgttaaaagcaggtgcttcaactaca 
gataaatctgctgaaacacttgtcaatatggatgctaaatcattcaatatgggagattat 
gtaacacgtattgctcaagtcaatactgtagatattgatgaagttttagatcgtaaagag 

25 gaatttgaaaaagttatgttagaaatgagtgccaatgaaaaatacgatttattcgttctt 
gtagttactgatattattaacagcgattctaaaatccttgtagttggtgctgaaaaagat 
aaagttggagaagcatttaaagtacaactagatgatggtatggctttcttatctggcgtt 
gtatcacgaaaaaaacaagttgttcctcaaatcactgaagttttaactcaataa 

30 Sequence 1244 

MDHSSASKKLIKDIEQNQYVTVKHLSHDDFYIDDLVKKKEVIASLEIPKDFSKHLKDNDL 
NKTLPLYSRDDFIGHIAMEIISRSLYEQQI PNI IHEHLDDMKQPQSLDKVKQSYYSLTPQ 
SKIKSVAINKHAHQSISIGIVFWVI FVSVIQILLHQRLKQNAPLERLYLVPYSQLKLYL 
TYISVHRISNFETAGPLYYRAEPVGCSATILYKMYKERGFEIKPEIAGLMISAI ISDSLL 

35 FKSPTCTKEDVDAAQALKDIANVDLEAYGLEMLKAGASTTDKSAETLVNMDAKSFNMGDY 
VTRIAQVNTVDIDEVLDRKEEFEKVMLEMSANEKYDLFVLWTDIINSDSKILVVGAEKD 
KVGEAFKVQLDDGMAFLSGVVSRKKQVVPQITEVLTQ* 

Sequence 1245 
40 Cont ig_0 57 7 jpos_l 123 9__1 2 618, 

is similar to (with p-value 0.0e+00) 

>sp:sp|P39616|DHA2_BACSU PROBABLE ALDEHYDE DEHYDROGENASE YWD 
H (EC 1.2.1.3). >pir:pir|S39713|S39713 hypothetical protein 
- Bacillus subtilis >gp : gp | X73 124 | BSGENR_59 B. subtilis genom 
45 ic region (325 to 333). NID: g413923. >gp : gp | Z99123 | BSUB0020 
_91 Bacillus subtilis complete genome (section 20 of 21) : fr 
om 3798401 to 4010550. NID: g2636240. 

atgacaataattagagataaatttaacaatagtaaagctttttttaatacgcataaaaca 
aaaaaccttaaatttcgaaaacaacaacttaaattactaagtaaaaatatcaaaaatcat 

50 gaaaatgaattattagatgccttatataaagatttaggtaaaagtaaggttgaagcatac 
gcaactgaaattggtatgcttttgaaaagcataaagctaatgcgcaaagagttaaaaaat 
tggt cgaaaaccaaacaaa cggata caeca ctct a cttattccctacaaagagt tat at t 
aaaaaagaaccttacggtacggtgcttattataggaccatttaattatccggttcaatta 
gttttcgagcctctcatcggagcaatagctgccggaaatactgctatagttaaaccttca 

55 gagttaacacctcatgttgccattgtgatcaaggacatcattgaagatacatttgatgaa 
gcatacgtttctgttgttgaaggtggtattgaagaaacccaaacgttattaagtctacca 
tttgattatatgttctttactggcagtgaaaaagtcggaaaaattgtctatgaagctgca 
gcaagaaaattaattccagttactcttgaacttggcggtaaatcacctgtcattgtcgat 
gatacagccaatatcaaagtagcgagtgaacgtattagttttggtaaatttactaatgct 
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ggtcaaacatgtgtcgctccagattatatattagttcagcggaaagttaaaaatgattta 
ataaaagctcttaaaaaaacaattactgaattttacggagaaaatattgaaaaaagccct 
gatttcggacggattgttaatcaaaaacactttaatcggttgaatgacttgattcaaatt 
cataaagataatgttgtttttggaggtaatagttctaaagaagatttatatattgaacct 
5 actttattggataacataaccaatgacaataaaatcatgaaagaagaaatattcggtccc 
attttgcctattattacttatgataatttcgatgaagtacttgaaatcatccaaagtaaa 
tcaaaaccactaagtttgtatctttttagcgaagatgaaaacatgacacatagagtggtt 
gaagaattatcatttgggggcggtgcaattaacgatacgttaatgcatttagctaatcct 
aacttacctttcggtggtgtaggttcttcaggcataggtcaatatcatggtaagtattct 
10 tttgatacatttagtcatatgaaatcatacacatttaaatctacacgtctagaatcgagt 
ttatttttccctccatataaaggtaaatttaaatatattaaaaccttcttcaagaactag 



Sequence 124 6 

15 MTIIRDKFNNSKAFFNTHKTKNLKFRKQQLKLLSKNIKNHENELLDALYKDLGKSKVEAY 
ATEIGMLLKSIKLMRKELKNWSKTKQTDTPLYLFPTKSYIKKEPYGTVLIIGPFNYPVQL 
VFEPLIGAIAAGNTAIVKPSELTPHVAIVIKDIIEDTFDEAYVSVVEGGIEETQTLLSLP 
FDYMFFTGSEKVGKIVYEAAARKLIPVTLELGGKSPVIVDDTANIKVASERISFGKFTNA 
GQTCVAPDYILVQRKVKNDLIKALKKTITEFYGENIEKSPDFGRI VNQKHFNRLNDLIQI 

20 HKDNWFGGNSSKEDLYIEPTLLDNITNDNKIMKEEIFGPILPI ITYDNFDEVLEIIQSK 
SKPLSLYLFSEDENMTHRVVEELSFGGGAINDTLMHLANPNLPFGGVGSSGIGQYHGKYS 
FDTFSHMKSYTFKSTRLESSLFFPPYKGKFKYIKTFFKN* 

Sequence 1247 
25 Contig_0577_pos_7994_7059, 

is similar to (with p-value 8.0e-44) 

>gp:gp| U62057 |MCU62057_2 Mycoplasma capricolum NADH oxidase 
(naox) gene, partial cds, and lipoate-protein ligase (lpla), 
pyruvate dehydrogenase EI alpha subunit (odpa) , pyruvate de 
30 hydrogenase EI beta subunit {odpb}, pyruvate dehydrogenase E 
II (odp2), dihydrolipoamide dehydrogenase (dldh), phosphotra 
nsacetylase (pta) and acetate kinase (ack) genes, complete c 
ds. NID: gl480703. 

atggaagagtatgttcttaaaaatttaccttctgaagaaagttattttttattttatatt 
35 aacagaccttcaattattgttggaaagaatcagaatacaattgaagaagttaatcaagcg 
tatattgataaacatcaaatagatgtagtgagacgtatttctggtggtggggctgtttat 
catgatactggaaacttaaattttagctttatcacagatgatgatggccatagctttcat 
aattttaaaaagtttacgatgcccattgtacaggccttacaatcaatgggagttaatgct 
gaaatgactggaaggaatgatatacaagtagggcaagctaaaatatctggaaatgctatg 
40 gttaaagtaaaaaatagaatgtttagtcatggtacattaatgctgaattgtgatttaaat 
gaagttcaaaaggcattaaaagtgaatccagctaaaattaaatctaaaggcgttaaatct 
gttagaaaaagagttgccaatattgaggaatttctagaacagccaatagatatagaagaa 
ttcaaaaaaattattcttaaaactatttttggtgaaaatgaagttgaagaatatatatta 
acagaagaagattggaaaaatattaagcaattaagtgatgaaaagtatcgtacgtgggaa 
45 tggaactatggcagcaatccaaaatataatattgagcgtgaagagaaatttgaaaaaggt 
tttattcaaataaaattagatgtaaaaaaaggaagaattgaacgggcaaaactatttgga 
gatttcttcggcgaaggagatgtaaccgaacttgaacatgcgttagtaggttgcttacat 
gattttgaacatatagaagaggcacttcaaaattatgatttctatcactactttggggat 
atagataagtatgaaattataagat tgatgtcctaa 

50 

Sequence 1248 

MEEYVLKNLPSEESYFLFYINRPSI IVGKNQNTIEEVNQAYIDKHQIDVVRRISGGGAVY 
HDTGNLNFSFITDDDGHSFHNFKKFTMPIVQALQSMGVNAEMTGRNDIQVGQAKISGNAM 
VKVKNRMFSHGTLMLNCDLNEVQKALKVNPAKIKSKGVKSVRKRVANIEEFLEQPIDIEE 
55 FKKIILKTIFGENEVEEYILTEEDWKNIKQLSDEKYRTWEWNYGSNPKYNIEREEKFEKG 
FIQIKLDVKKGRIERAKLFGDFFGEGDVTELEHALVGCLHDFEHIEEALQNYDFYHYFGD 
IDKYEI IRLMS* 

Sequence 1249 



313 



WO 01/34809 



PCT7US00/30782 



Contig_0577_pos_6503_5973, 

putative peptide of unknown function 

atggtaatacgacctaaatatgatgaatatcagcaaacaaatggtactgaaattatcaga 
tttgatcaaactcgcaaagaaagtccatttaaagtacagagaattatcgaaagatcatgt 
5 aaattttatggtaataattatattagtaaaaaagcagaaacgaatcgtattactggaatc 
tctagtaaaccacctattttacttacgcctctttttcctacttacttttttccaactcac 
tcagaccgtcaagaagaaaatatatggattaatatgcattatattgaaaatgttaaagaa 
cttaaaaatcgtaagagtaaaataatttttgcgaatggtgattcgttaacgctcaatgta 
tcatttcatagcttgtggcatcaatatacgaatgcaatcatctattattacatggtagat 
10 aagcaatcacgaatgaaatctaacaaccctgaacaacccattgactataatcagtcttct 
ctaaatattttcgaggcgctctcacgctactccctttttgaagaaaattag 

Sequence 1250 

MVIRPKYDEYQQTNGTEIIRFDQTRKESPFKVQRIIERSCKFYGNNYISKKAETNRITGI 
15 SSKPPILLTPLFPTYFFPTHSDRQEENIWINMHYIENVKELKNRKSKIIFANGDSLTLNV 
SFHSLWHQYTNAII YYYMVDKQSRMKSNNPEQPI DYNQSSLNI FEALSRYSLFEEN* 

Sequence 1251 

Contig_0577_pos_3724_3383, 

20 putative peptide of unknown function 

gtgattgcgttattatttgattggtttgttatttttagtagtattttaatttctgtaact 
tttaataacattttgatctatcttgtagctattatattaattggtagtagaatgagagca 
tttgataatttaatgcatgaagcttgtcatcgatctttatttacaaataagttttggaat 
aaatggataacttgtttatttgttgcatttccagtatttactagttatacagcatatcga 

25 aatgctcatcattgtactggattttttaaaactacattgcctggctt ttatagtgtagtt 
aaagatatttttacaccgaactatgaaaggaaagagagttag 

Sequence 1252 

VI ALLFDWFVI FSSI LISVTFNNILI YLVAI I LIGSRMRAFDNLMHEACHRSLFTNKFWN 
30 KWITCLFVAFPVFTSYTAYRNAHHCTGFFKTTLPGFYSVVKDI FTPNYERKES * 

Sequence 1253 

Cont ig_05 7 7_pos_3 1 7 9_2 7 63, 

putative peptide of unknown function 

35 atgccaactttacaactttttatttacatcatcgggtttttgtttttgatgtatatggca 
tggtctttatggacggaaaagccttcaaatatcgaagaaatagaaccta tgtcagctaaa 
aagcagattttatttgctttatctgtttctttattaaatcttcatgcaataatggatact 
gttggtgttatcggaacgagtgcttcagtttatgatggttatgacaaagttgttttttca 
ttggctacaatttctgtatcttggatttggtttgtctttttagctattttaggaagaatt 

40 acaggaaaaattgataaaagtggtaagtatatcgttattttaaataaagtttctagcgtt 
attgttattattgtaggattaattatattaaaaaatattgttggaattttaagttag 

Sequence 1254 

MPTLQLFIYIIGFLFLMYMAWSLWTEKPSNIEEIEPMSAKKQILFALSVSLLNLHAIMDT 
45 VGVIGTSASVYDGYDKVVFSLATISVSWIWFVFLAILGRITGKIDKSGKYIVILNKVSSV 
IVIIVGLIILKNIVGILS* 

Sequence 1255 
Contig_0577_pos_1303_881, 

50 putative peptide of unknown function 

gtgaatgtgagatatcggtcgttttatgagcatatttgggtatatgagaaagacggggtt 
gttgctggatgtgtaat tgcgtatcctgggaaagaggaaatggattttgaacaacaatgg 
cttaagttaccacttgaagaagatatccttcagttaggtacaccattacctgaaaaagaa 
tcatacgatgatgaaatatatatagaagcagtagtaacgactccaaaatatcgaggacaa 

55 ggtattgcaacacaacttttaaagtatgtaatttccactcatgcacatgaaaaatgggga 
ttgaattgtgattatgataataataaagcacgccacctatatcacaaattagggtttaaa 
gaggatgcgacaattcgtttatatggccatcaatattttcatatgacattgaataataag 
taa 
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Sequence 1256 

VNVRYRSFYEHIWVYEKDGWAGCVIAYPGKEEMDFEQQWLKLPLEEDILQLGTPLPEKE 
SYDDEIYIEAWTTPKYRGQGIATQLLKYVISTHAHEKWGLNCDYDNNKARHLYHKLGFK 
EDATIRLYGHQYFHMTLNNK* 

5 

Sequence 1257 

Contig_0578_pos_404_2398, 

is similar to (with p-value 0.0e+00) 

>sp:sp[P50620|RIRl_BACSU RIBONUCLEOSIDE-DIPHOSPHATE REDUCTAS 

10 E ALPHA CHAIN (EC 1.17.4.1) (RIBONUCLEOTIDE REDUCTASE). >gp: 
gp|Z68500|BSNRDYMA_2 B.subtilis cwlC, nrdE, nrdF, ymaA and y 
maB genes. NID: gll54630. >gp: gp | Z99113 I BSUB0010_32 Bacillus 
subtilis complete genome (section 10 of 21) : from 1781201 t 
o 2014980. NID: g2634090. 

15 gtgtatcttgaagaaatccatgataaaatgatttcttttgatgatgaaatcgaaagactt 
cattatcttgttgataataatttttattttaatgttttcgaaaaatatagcgaagcagaa 
ttaattgaaattactgaatacgcaaaatcaattcacttccaatttgctagttatatgtcg 
gcaagtaagttttataaggattacgctttaaaaacgaatgataaaactaagtttcttgag 
gattataatcagcacgtcgcaatcgttgcactttatttagcgaatggtaatgttaaacaa 

20 gcaaaacaatttatttctgcaatggtggaacaacgttatcaaccggcgacaccaacattc 
ttgaatgcaggtagagcaagacgtggagaacttgtttcatgtttcttattagaagtagac 
gatagcttaaattctatcaatttcattgattcaactgcaaaacaacttagtaaaattggt 
ggtggtgtggccattaatttat cgaaacttcgtgcacgtggagaagcaattaaaggaatt 
aaaggtgtagctaaaggtgttttacctgttgctaaggcacttgaaggtggatttagttat 

25 gcagatcaattaggacaacgtcctggcgctggggcagtgtacttaaatattttccattac 
gatgttgaagagtttttagatactaagaaagtgaatgcagatgaagatttacgtttatct 
actatttcgactggtcttattgtaccttctaaattctttgatttggctaaagagggtaaa 
gatttctatatgtttgcacctcatacagttaaacaagaatacggtgtgactttagatgat 
attgatttagaaaagtattacgacgatatggttgcaaaccctaatatcgataaaaagaaa 

30 aaagacgcacgtgaaatgttaaatatgattgcacaaacacaattacagtctggttatcca 
tatcttat gtttaaagataatgcaaacaaagtacatgcgaattcaaatattgggcaaat t 
aaaatgagtaatttatgtactgaaattttccaattacaagagacatcagtaattaacgac 
tatggaattgaagatgaaattaaaagagatatttcatgtaacttaggctcattgaatata 
gtgaatgttatggaatcaggtaaattcagagattctgtgttcacaggtatggatgctctt 

35 acagttgtaagtgatgaagcaaatattcaaaatgcaccaggtgtaaaaaaagcgaatagt 
gaactacattctgttggactaggagtaatgaacttacatggttatctagctaaaaataaa 
attggctatgaatctgaagaagcaaaagactttgctaatatattctttatgattatgaac 
tattattccatcgaacgttcaatggaaattgcaaaagagcgtggagaaaagtatcaagac 
tttgagcaatcagactatgcaaatggtaaatattttgaattctatacatctcaagaatt t 

40 gaaccaaaatttgaaaaggttcgccaactttttgatggtatcgatatacctacttcaaat 
gattggaaagaattgcaaaataaagtagaacaatatggactttatcacgcttatagatta 
gctattgctccgactcaaagtatttcttatgttcaaaatgcgacaagttctgttatgcct 
attgttgaccaaattgaacgtcgtacgtatggaaatgcggaaacattttatccaatgcca 
tttttatcaccagaaacaatgtggtactacaaatcagcatttaatactgaccaaatgaaa 

45 cttattgacttagtagctacaattcaaactcacgtagaccaaggtatttcaacgatactt 
tatgttaattcagaaatttcaacacgtgaactttcaagattatatgtatatgcacaccat 
aaaggccttaaatctttatactatacacgtaataaattattaagtgtggaagaatgtaca 
agttgtgcgatttaa 

50 Sequence 1258 

VYLEEIHDKMISFDDEIERLHYLVDNNFYFNVFEKYSEAELIEITEYAKSIHFQFASYMS 
ASKFYKDYALKTNDKTKFLEDYNQHVAIVALYLANGNVKQAKQFISAMVEQRYQPATPTF 
LNAGRARRGELVSCFLLEVDDSLNSINFIDSTAKQLSKIGGGVAINLSKLRARGEAIKGI 
KGVAKGVLPVAKALEGGFSYADQLGQRPGAGAVYLNIFHYDVEEFLDTKKVNADEDLRLS 

55 TISTGLIVPSKFFDLAKEGKDFYMFAPHTVKQEYGVTLDDIDLEKYYDDMVANPNIDKKK 
KDAREMLNMIAQTQLQSGYPYLMFKDNANKVHANSNIGQIKMSNLCTEIFQLQETSVIND 
YGIEDEIKRDISCNLGSLNIVNVMESGKFRDSVFTGMDALTVVSDEANIQNAPGVKKANS 
ELHSVGLGVMNLHGYLAKNKIGYESEEAKDFANIFFMIMNYYSIERSMEIAKERGEKYQD 
FEQSDYANGKYFEFYTSQEFEPKFEKVRQLFDGIDIPTSNDWKELQNKVEQYGLYHAYRL 



315 



WO 01/34809 



PCT/USOO/30782 



AI APTQS I SYVQNATSSVMPI VDQI ERRT YGNAETFYPMPFLS PETMWYYKS AFNTDQMK 
LIDLVATIQTHVDQGISTILYVNSEISTRELSRLYVYAHHKGLKSLYYTRNKLLSVEECT 
SCAI* 

5 Sequence 12 59 

Con t ig_0 57 8_po s_2 5 5 0_3 4 85, 

is similar to (with p-value 2.0e-84) 

>sp:sp|P50621|RIR2_BACSU RIBONUCLEOSIDE-DIPHOSPHATE REDUCTAS 
E BETA CHAIN (EC 1.17.4.1) (RIBONUCLEOTIDE REDUCTASE). >gp:g 

10 p| Z68500 |BSNRDYMA_3 B.subtilis cwlC, nrdE, nrdF, ymaA and ym 
aB genes. NID: gll54630. >gp: gp I Z99113 | BSUB0010_33 Bacillus 
subtilis complete genome (section 10 of 21) : from 1781201 to 

2014980. NID: g2634090. 
atgactaatatgttctggcgtcaaaatatctcacaaatgtgggtggaaacagaatttaaa 

15 gtatcaaaagatatagcaagttggaaaacattaacagattctgagaaaaatacttttaaa 
aaagcgcttgcaggtttaacaggtttagatacacatcaagctgatgatggtatgccatta 
atcatgcttcatactactgatttaagaaagaaagctgtttattcatttatggctatgatg 
gaacaaatccatgcgaaaagttattctcatatcttcactacattattaccatctagtgaa 
accaactatttattggatacttgggttattgaagagccacatttaaaatataaatcagat 

20 aaaattgtagaaaattaccacaaactttggggtaaagaagcatcgatttacgatcaatat 
attgctcgtgtttctagtgtattcttagaaacatttctattctattctggcttctattat 
ccattatatctcgcaggacaaggaaaaatgactacgtcaggtgaaattatacgtaagata 
cttttagatgaatctatacatggagtgttcacaggtttagatgcacaaagtctacgtaat 
gagttatctgaaagtgagaaacaaaaagctgatcaagaaatgtacaaattattaaatgaa 

25 ctttatgataatgaagtttcatatacacatttattatatgatgatattggtcttgctgaa 
gatgtattaaattatgttcgatataatggaaataaagcattatcgaatttaggttttgaa 
ccatatttcgaagagagagagtttaaccctattattgaaaatgcactagatacatctaca 
aagaaccacgatttcttctctgttaaaggtgatggctatacattggctttaaatgttgag 
cctctacgtgatgaagactttgtttttgataattaa 

30 

Sequence 1260 

MTNMFWRQNISQMWVETEFKVSKDIASWKTLTDSEKNTFKKALAGLTGLDTHQADDGMPL 
IMLHTTDLRKKAVYSFMAMMEQIHAKSYSHIFTTLLPSSETNYLLDTWVIEEPHLKYKSD 
KIVENYHKLWGKEASI YDQYIARVSSVFLETFLFYSGFYYPLYLAGQGKMTTSGET.I RKI 
35 LLDESIHGVFTGLDAQSLRNELSESEKQKADQEMYKLLNELYDNEVSYTHLLYDDIGLAE 
DVLNYVRYNGNKALSNLGFEPYFEEREFNPIIENALDTSTKNHDFFSVKGDGYTLALNVE 
PLRDEDFVFDN* 

Sequence 1261 
40 Contig_0578_pos_3739_4 713, 

is similar to (with p-value 0.0e+00) 

>gp: gp I AJ005352 | SAA005352_1 Staphylococcus aureus, Sst putat 
ive iron transport operon. NID: g3724154. 

atgaaatttatttttaaaggttataccttatttattttattagtgattttaactattgtc 
45 tctttatttataggggtgagtcagctctctctaatagatattttccatttaagtgatgaa 
caaataaatattttgttttcgagtcgaattcccagaacagttagtattctactttcgggt 
agttcactagctttatcaggattaattatgcaacagatgatgcaaaataaatttgtaagc 
ccaacgactgctggtactatggagtgggcaaaattaggtattttaatgtcattgttgttc 
tttcctaatggtcccattttaatcaaattattatttgctgttgttctaagtattgttgga 
50 acgtttttatttgtccaattaattaatcttatccgtgtaaaagatgtaatctttgttcca 
cttttaggcattatgattggtggtattttatccagttttactacatttgtagcgttgaga 
accaatgctttacaaagcattggaaactggttaactggtaactttgcagttataacgagt 
ggtcgttttgaggtgttgtatctcacaataccattacttattttggcatttgtatttgca 
aa teat tttactattgcaggtatgggaaaagactttagtcataatttaggtgtaagt tat 
55 gaaaaaatcattaaaatagcattattcataacagcaacgctaacagcattagttgttgtt 
actgttggaacattaccatttttaggtttaatcgtaccgaatatcatatctatctataga 
ggcgatcatttgaaaaatgctttaccacacacgcttatgctcggtgcaatttttgtttta 
attgctgatattattggtagaataattgtttacccttacgaaattaatattggattgacg 
ataggtgtatttggcacaattattttcctaatcttgctaatgaaaggtaggaaaaattat 
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gcaaatgagcgctaa 
Sequence 1262 

MKFIFKGYTLFILLVILTIVSLFIGVSQLSLIDIFHLSDEQINILFSSRIPRTVSILLSG 
5 SSLALSGLIMQQMMQNKFVSPTTAGTMEWAKLGILMSLLFFPNGPILIKLLFAWLSIVG 
TFLFVQLINLIRVKDVI FVPLLGIMIGGILSSFTTFVALRTNALQSIGNWLTGNFAVITS 
GRFEVLYLTIPLLILAFVFANHFTIAGMGKDFSHNLGVSYEKI IKIALFITATLTAIiVVV 
TVGTLPFLGLIVPNI ISIYRGDHLKNALPHTLMLGAIFVLIADIIGRI IVYPYEINIGLT 
IGVFGT I I FLI LLMKGRKN YANER* 

10 

Sequence 1263 

Contig_0578_pos_4 739_5656, 

is similar to (with p-value 1.0e-85) 

>gp:gp| AJ005352 |SAA005352_2 Staphylococcus . aureus, Sst putat 

15 ive iron transport operon. NID: g3724154. 

atggcaatttgtatggcaattttttatttattggtaggtttagattttgatatatttgaa 
tatcaattccaaagccgtttcaaaaaatttattcttatattattagtaggagcatcgata 
ggaacgtccgtagttattttccaatctattactacaaatagattattaactccttctatt 
atgggacttgattctgtgtatttatttgtcaaagtcttacctatttttattttaggtgaa 

20 caagcaactgttgttacaaatatctaccttaattttttaatcactttgatcgcaatggtg 
tttttttcattgctattatttcaagttatatttaaactaggtcacttttcagtctatttc 
atattactagttggtgtaatattaggaactttcttccgtagtattacaagttttctccaa 
cttataatgaatcctgaatcttttttagcagttcaaaatgttatgtttgctagctttgaa 
gcatcaaattctaaattagtaactgtttcaggcatattattagttatattaataattgtt 

25 acaataatacttcgaccatatttagatgttcttttgttaggtagagcacaagcaattaat 
ctaggtgtttcttatgagaatatgacgcgtatgtttcttattttggttgctttactcgtc 
tctatatctactgcattaataggtcctgtaacatttcttggcttattaactgtcaactta 
gcacacgaatttatgaagacctatgaacacaaatttattttaccggcaacaatattattc 
agctggattagtttatttatagcgcaatgggtagttgaaaatctattcgaagcaacaact 

30 gaattcagtttgatagtagatttagttggtggaagttacttcatatatttgcttgtcaaa 
aggagaaatgcaaat tga 

Sequence 1264 

MAICMAIFYLLVGLDFDIFEYQFQSRFKKFILILLVGASIGTSVVIFQSITTNRLLTPSI 
35 MGLDSVYLFVKVLPI FILGEQATVVTNIYLNFLITLIAMVFFSLLLFQVIFKLGHFSVYF 
ILLVGVILGTFFRSITSFLQLIMNPESFLAVQNVMFASFEASNSKLVTVSGILLVILIIV 
TI ILRPYLDVLLLGRAQAI NLGVS YENMTRMFLI LVALLVSI STALIGPVT FLGLLTVNL 
AHEFMKTYEHKFILPATILFSWISLFIAQWVVENLFEATTEFSLIVDLVGGSYFI YLLVK 
RRNAN * 

40 

Sequence 1265 

Contig_057 8_pos_5914_6432, 

is similar to (with p-value 6.0e-67) 

>gp:gp| AJ005352 |SAA005352_3 Staphylococcus aureus, Sst putat 

45 ive iron transport operon. NID: g3724154. 

atgaacataactatcgagcaactagttaaatttggtcgttttccatattcaaaaggtcga 
atgaagcaagaagactatgacaaagtggaatatgctcttgacttattacagcttaatgaa 
attaaacatcgtaatattaaaacactatctggaggtcagcgacaacgtgcatatattgct 
atgactatagcacaagatactgattatatattattagacgagcctttaaataatttagat 

50 atgaaacattcagtacagattatgcaaacgttacgagatttatgtcgtcagttaaataaa 
acaatcattatcgtattacatgatatcaactttgcttcttgttattcagacgacatca Ut 
gcgcttaaacaaggtgagctagttaaagctgatgataaagataatgttattcagtctgac 
attttaaaaagtttatatgaaatggaagtacgtatagaggagataaggggacaacgtatt 
tgtctatattatgatgaaactacttttgactcagtttaa 

55 

Sequence 1266 

MNITIEQLVKFGRFPYSKGRMKQEDYDKVEYALDLLQLNEIKHRNIKTLSGGQRQRAYIA 
MTIAQDTDYILLDEPLNNLDMKHSVQIMQTLRDLCRQLNKTIIIVLHDINFASCYSDDII 
ALKQGELVKADDKDNVIQSDILKSLYEMEVRIEEIRGQRICLYYDETTFDSV+ 
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Sequence 1267 
Contig_0578_pos_6511_7554, 
is similar to {with p-value 0.0e+00) 
5 >gp:gp|AJ005352|SAA005352_4 Staphylococcus aureus, Sst putat 
ive iron transport operon. NID: g3724154. 

atgaaaaaaacagtcttatttttattattgtctctagttttagttttaacggcttgtagt 
aatagttcgaataataattcaacttcgaaaaagaaaaatagtgattctaaagaaactgta 
accatcaaaaatagttttgaagcaagtggtaaagaaaataatggcagtgataagaaaaaa 

10 atctctaatactgtcgaagtaccaaagaatcctaaaaatgccgttgtattagattatgga 
gcgcttgatgtgttgaaagaattaggtgtggctgataaagtaaaaggtttacctaaaggt 
gaaaataaccaatctttacctaaatttttagatgaatttaaagatgataagtatattaat 
actggaaatttaaaagaagtgaactttgataaagttgcatcagctaaaccagatgtgatt 
tttatttcaggaagaacagctaatcagaaaaatttagatgaatttaaaaaagctgcacca 

15 aaagctaaagttgtatatgtaggtacaagtgatgacaacttaattaaagatatgaaaaaa 
aatacagaaaatt tagggaaaatctacgataaagaagataaagctaaaaaaattaataaa 
gatttagatagaaaaatatctgatatgaaagataaaactaaagactttaataagaaagta 
atgtatttattggttaacgaaggtgaactatcaacgtttggaccaggaggaagatttggt 
ggtttagtgtttgatacattaggatttaaacctgcagacaaaaaggttagcaaaagcccg 

20 catggtcaaaatataaataatgaatatattaacaagcagaatccagatgttattttagct 
atggatcgtggttcagttgtaggtggtaaagcaacaacaaatcaagttttaaaaaacaaa 
gttataaaaaatgtaaaagcagtaaaaagtaatcatatttacgaattagatccaaaacta 
tggtatttctcttcaggatcttcaacgacaactatcaaacaaattgatgaattaaatgaa 
gtagtagagaaagttgaaaaataa 

25 

Sequence 1268 

MKKTVLFLLLSLVLVLTACSNSSNNNSTSKKKNSDSKETVTIKNSFEASGKENNGSDKKK 
ISNTVEVPKNPKNAVVLDYGALDVLKELGVADKVKGLPKGENNQSLPKFLDEFKDDKYIN 
TGNLKEVNFDKVASAKPDVIFISGRTANQKNLDEFKFCAAPKAKVVYVGTSDDNLIKDMKK 
30 NTENLGKI YDKEDKAKKINKDItDRKISDMKDKTKDFNKKVMYLLVNEGELSTFGPGGRFG 
GLVFDTLGFKPADKKVSKSPHGQNINNEYINKQNPDVILAMDRGSWGGKATTNQVLKNK 
VIKNVKAVKSNHI YELDPKLWYFSSGSSTTTIKQI DELNEVVEKVEK* 

Sequence 1269 
35 Contig_0578_pos_8 973_8041, 

is similar to (with p-value 7.0e-41) 

>sp: sp| P1857 9 |MURB_BACSU UDP-N-ACETYLENOLPYRUVOYLGLUCOSAMINE 
REDUCTASE (EC 1.1.1.158) (UDP-N- ACETYLMURAMATE DEHYDROGENA 
SE) . >pir:pir IS26500I A43727 probable division initiation reg 

40 ulatory protein 1 - Bacillus subtilis >gp : gp | M31827 | BACDDSA_ 
2 Bacillus subtilis (clone lambda-BSl) cell division and spo 
rulation protein (dds) gene, complete cds . NID: gl42831. >gp 
:gp| Z99111 | BSUB0008_195 Bacillus subtilis complete genome (s 
ection 8 of 21): from 1394791 to 1603020. NID: g2633699. 

45 atgttcaaaacattgaataaaaatgacatcttacgcggattagagtcaattcttcctaaa 
gatattattaaagtggatgaacctctcaagcgttatacatatacagaaacaggaggagag 
gcagatttttatttatcccctaccaaaaatgaagaagtccaagccatcgtaaagtttgcc 
catgagaacagtataccggtaacttatttaggaaatgggtctaacattatcattcgagaa 
ggtggaattcgaggaatcgtcctcagcttattatctctcaatcatattgaaacctctgat 

50 gatgcaattatagcaggtagtggtgcagcaattatt gacgtttcaaatgttgcacgtgac 
catgtattaaccggtttagaatttgcatgcggtatccctgggtcaattggtggcgccgta 
ttcatgaatgctggtgcttatggcggagaagttaaagactgtattgactatgcattatgt 
gtcaatgaaaaaggtgatttattaaagctcactacagctgaactggaattagactataga 
aatagtgtggtacaacaaaaacatttagttgtattagaggctgctttcaccttagaacca 

55 ggtaaattagatgaaattcaggccaaaatggatgatcttactgaaagacgtgaatctaaa 
caaccgcttgaattcccttcttgcggaagtgttttccaaagaccaccgggtcattttgca 
ggtaaactcattcaagattctaatttacagggctatcgaatcggtggcgttgaagtttca 
actaagcatgcgggattcatggttaatgtagacaacggtacagcaactgattatgaagca 
cttatacatcacgtacaaaaaatagttaaagaaaaattcgatgttgaattgaatactgag 
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gtacgtattataggtgatcatcccacagattaa 
Sequence 1270 

MFKTLNKNDILRGLESILPKDI IKVDEPLKRYTYTETGGEADFYLSPTKNEEVQAIVKFA 
5 HENSIPVTYLGNGSNIIIREGGIRGIVLSLLSLNHIETSDDAIIAGSGAAIIDVSNVARD 
HVLTGLE FACGI PGS I GGAV FMNAGAYGGEVKDCI DYALCVNEKGDLLKLTTAELELDY R 
NSVVQQKHLVVLEAAFTLEPGKLDEIQAKMDDLTERRESKQPLEFPSCGSVFQRPPGHFA 
GKLIQDSNLQGYRIGGVEVSTKHAGFMVNVDNGTATDYEALIHHVQKIVKEKFDVELNTE 
VRIIGDHPTD* 

10 

Sequence 1271 

Contig_0578_pos_4 304_3819, 

is similar to (with p-value 2.0e-60) 

>gp:gp|AJ005352|SAA005352_l Staphylococcus aureus, Sst putat 

15 ive iron transport operon. NID: g3724154. 

gtgagatacaacacctcaaaacgaccactcgttataactgcaaagttaccagttaaccag 
tttccaatgctttgtaaagcattggttctcaacgctacaaatgtagtaaaactggataaa 
ataccaccaatcataatgcctaaaagtggaacaaagattacatcttttacacggataaga 
ttaattaattggacaaataaaaacgttccaacaatacttagaacaacagcaaataataat 

20 ttgattaaaatgggaccattaggaaagaacaacaatgacattaaaatacctaattttgcc 
cactccatagtaccagcagtcgttgggcttacaaatttattttgcatcatctgttgcata 
attaatcctgataaagctagtgaactacccgaaagtagaatactaactgttctgggaatt 
cgactcgaaaacaaaatatttatttgttcatcacttaaatggaaaatatctattagagag 
agctga 

25 

Sequence 1272 

VRYNTSKRPLVITAKLPVNQFPMLCKALVLNATNVVKLDKIPPIIMPKSGTKITSFTRIR 
LINWTNKNVPTILRTTANNNLIKMGPLGKNNNDIKIPNFAHSIVPAVVGLTNLFCI ICCI 
INPDKASELPESRILTVLGIRLENKIFICSSLKWKISIRES* 

30 

Sequence 1273 

Contig_0580_pos_1602_197 6, 

is similar to (with p-value 6.0e-21) 

>nrl3d:pir II 1GPHA Glutamine phosphoribosylpyrophosphate (prp 
35 p) Amidotransf erase (EC 2.4.2.14), chain A - Bacillus subtil 
is >nrl3d:pir I 1 1GPHB Glutamine phosphoribosylpyrophosphate ( 
prpp) Amidotransf erase (EC 2.4.2.14), chain B - Bacillus sub 
tilis >nrl3d:pir II 1GPHC Glutamine phosphoribosylpyrophosphat 
e (prpp) Amidotransferase (EC 2.4.2.14), chain C - Bacillus 
40 subtilis >nrl3d: pir I 1 1GPHD Glutamine phosphoribosylpyrophosp 
hate (prpp) Amidotransferase (EC 2.4.2.14), chain D - Bacill 
us subtilis 

atgcttaaggattcaggagctaaccgcattcacgtaagaattgcttctcccgaattcatg 
ttccctagtttttatggtattgacgtatctacaacagctgaactcatctcagcaagtaag 
45 tctcctgaggaaattaaaaatcatattggtgcagattctcttgcttatttaagcgttgat 
ggcttaatcgagtctataggacttgattatgatgcgccatatcatggcttgtgtgtagaa 
agttttacaggtgattatccagcaggactttacgattatgagaaaaattataaaaagcat 
ttaagtgaacgtcaaaaatcatatatagctaataataaacattattttgatagtgaggga 
aatttacatgtctaa 

50 

Sequence 1274 

MLKDSGANRIHVRIASPEFMFPSFYGIDVSTTAELISASKSPEEIKNHIGADSLAYLSVD 
GLIESIGLDYDAPYHGLCVESFTGDYPAGLYDYEKNYKKHLSERQKSYIANNKHYFDSEG 
NLHV+ 

55 

Sequence 1275 

Contig_0580_pos_3583_5061, 

is similar to (with p-value 0.0e+00) 

>sp:sp| P12048 | PUR9_BACSU PHOSPHORIBOSYLAMINOIMI DAZOLECARBOXA 
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MIDE FORM YLTRANSFE RASE (EC 2.1.2.3) (AICAR TRANSFORMYLASE) / 
IMP CYCLOHYDROLASE (EC 3.5.4.10) ( INOSINICASE) (IMP SYNTHET 

ASE) (ATIC) . >pir:pir j A29183 | DTBSPH purH bifunctional enzyme 
- Bacillus subtilis >gp: gp IJ02732 | BACPURF_10 B.subtilis pur 
5 operon encoding purine biosynthesis enzymes, 12 genes. NID: 
gl43363. >gp : gp I Z99107 | BSUB0004JL00 Bacillus subtilis compl 

ete genome (section 4 of 21): from 600701 to 813890. NID: g2 

632866. 

atgaaaaaagcaatattaagcgtttctaataaaagtggaattgtagagtttgcaaaagca 

10 ttaactaatttagactatgaactgtattctacgggtggtacaaaacgtgtattagaagat 
gcgaatatcaatattaagtccgtgtcagaattaacacaatttccagagattatggatggt 
cgtgttaaaacactacatccagcagtccatggtggtattttagctgatcgagataaagaa 
catcatttagagcaattacgagaacaacatattgatttaattgatatggtagtagtcaac 
ctatatcctttccaacagactgttgctcaacctgatgtaacagaaactgatgcaatagaa 

15 aatattgatattggtggacctacaatgttaagagcagctgctaaaaactttaaacatgtt 
acaactatcgtacatccttccgattacaacgaggtaattgagagaattaaaaatcatcaa 
ttggacgaagcatatagaaaatcgctaatggttaaagttttccaacatacaaatgaatat 
gatcatgctattgttaactatttcaaagacaataaagaaacactaagatatggcgaaaat 
cctcaacaatctgcatattttgttagaacatctgatagcaaacatacgattgctggtgca 

20 aaacaattacatggtaaacaattgagttttaataatattaaagacgcagatgcagcgctc 
agtttagtaaaaaaattcaacgagccaactgctgtagcagtaaaacatatgaacccgtgt 
ggagtaggaattggacagtcgattgatgaagcatttcaacatgcatatgaagcggataat 
caatcaatatttggcggaattatagcattgaatagaacggtagatgttaaattagctgaa 
gcattacattctatctttttagaagtagttatcgcacctcaatt tact gaggaa get tta 

25 aaaatattgacacaaaagaaaaatattcgtttattacaaatagatatgacaattgataac 
gctgaacaagaatttgttt ccgt ttcaggtggt tact t agtacaagataaagat aataaa 
gatgtgactcgaaatgacatgactgttgctaccgacattcaacctacagaagcacagtgg 
gaagctatgctcctaggttggaaagttgtaagtgccgt taagagtaatgcagtgat attg 
agtaacaacaaacaaacagttggtataggtgcagggcaaatgaatcgtgtagg t tecget 

30 aaaattgcaatcgaaagagcaatagaaattaacgataatgttgcgcttgtttcagatggt 
ttcttcccaatgggagatacagttgaatatgctgccgaacatggtattaaggcaattatt 
caaccaggtggttcaattaaagatcaagattccattgatatggctaataaatatggcatt 
acaatggttatgacaggtatgcgtcattttaaacattaa 

35 Sequence 1276 

MKKAILSVSNKSGIVEFAKALTNLDYELYSTGGTKRVLEDANINIKSVSELTQFPEIMDG 
RVKTLHPAVHGGILADRDKEHHLEQLREQHIDLIDMVVVNLYPFQQTVAQPDVTETDAIE 
NIDIGGPTMLRAAAKNFKHVTTIVHPSDYNEVIERIKNHQLDEAYRKSLMVKVFQHTNEY 
DHAIVNYFKDNKETLRYGENPQQSAYFVRTSDSKHTIAGAKQLHGKQLSFNNIKDADAAL 

40 S LVKKFNEPTAVAVKHMNPCGVG I GQS I DEAFQHAYEADNQS I FGG II ALNRTVDVKLAE 
ALHSIFLEVVIAPQFTEEALKILTQKKNIRLLQIDMTIDNAEQEFVSVSGGYLVQDKDNK 
DVTRN DMTVAT DI Q PT EAQWEAMLLG WKVVSA VKSN AV I LSNN KQT VG I GAGQMN R VGS A 
KIAIERAIEINDNVALVSDGFFPMGDTVEYAAEHGIKAIIQPGGSIKDQDSIDMANKYGI 
TMVMTGMRH FKH * 

45 

Sequence 1277 

Contig_0581_pos_7809_6784, 

putative peptide of unknown function 

atgaatattgcagctgttctacaaacaattccatcactggcactacttggtttgatgata 
50 ccaattttcggaattgggagacttccggcaattatcgccttagttgtatatgcgttactt 
cagcaagatggttccagtattattgacttagctcaccgtatgaaattaaatgaacctatc 
gatattactaaacgttatcatgatcgtagttttattcgttgtggtacgaatcaaattcca 
gacgttgttgataaagtagttaaaagcgctgtagctaaaggctatgatatgagtgatata 
caagttttggctcctatgtataaaggtaacgctggtattaagagacttaaccaagttcta 
55 caatctattcttaatccgaagcaacaagatgatcgtgaaatagaatttggtgaagctgtg 
tttagaaaaggggataaagtacttcagttagttaatcgacctaatgataatatatttaat 
ggggatataggtataatagtaggtatattttgggccaaagaaaatgctctaaataaggat 
gtgttagttgtagattttgaaggtaatgaaattacatttactaaacaagatttaatggaa 
ctaacacatgcatattgtacatctatccataaatcacaaggttcagaatttcctattgta 
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attatgcctattgttagacaatattataggatgttacaacgtcccattctttatacagga 
ttaactagagctaaacaatcacttgttttgcttggtgaacaagaagcatttgatataggt 
ttaaaaacaaatggacaaatacgattaacgcaattaaatgatttgttaaaatcgtatttt 
ggacaaaacaaagataatttaactacaaataaacaaacgattaacgaacaaaaagaaaat 
5 aacaatcatctggatttgaaaaatgaaaaagaaaatgatatccaattaaacgagtcgaca 
attttccaaatcgatccaatgattaatatgggggaaatgacgccatatgacttcgttgaa 
cgttga 

Sequence 1278 

10 MNIAAVLQTIPSLALLGLMIPIFGIGRLPAIIALVVYALLQQDGSSI IDLAHE^MKLNEPI 
DITKRYHDRSFIRCGTNQIPDVVDKVVKSAVAKGYDMSDIQVLAPMYKGNAGIKRLNQVL 
QSILNPKQQDDREIEFGEAVFRKGDKVLQLVNRPNDNI FNGDIGIIVGIFWAKENALNKD 
VLVVDFEGNEITFTKQDLMELTHAYCTSIHKSQGSEFPI VIMPIVRQYYRMLQRPILYTG 
LTRAKQSLVLLGEQEAFDIGLKTNGQI RLTQLNDLLKSYFGQNKDNLTTNKQTINEQKEN 

15 NNHLDLKNEKENDIQLNESTI FQI DPMINMGEMTPYDFVER* 

Sequence 1279 

Contig_0581_posJ3057_27 4 9, 

putative peptide of unknown function 

20 atgacagaacataatcacgatgctgagttaacaataaataatgaagaggaattacttacg 
ttatatgatgaaaacggaaatgaagttttataccgtaaaatgttagaattttatcatcca 
gaattcaaaaaagaatatgtcgttcttgcagaagaaggtgcacaatcagatgacgaagat 
atgattgaacttgtaccaatgataaatgaacctgatgagtctggtgatggtgggaaatta 
gtccctattgaaacagatgaagaatgggatatgattgaagaagttgtaaatactgagatt 

25 aacgaataa 

Sequence 1280 

MTEHNHDAELTINNEEELLTLYDENGNEVLYRKMLEFYHPEFKKEYVVLAEEGAQSDDED 
MIELVPMINEPDESGDGGKLVPIETDEEWDMIEEVVNTEINE* 

30 

Sequence 1281 

Contig_0581_pos_1682_759, 

putative peptide of unknown function 

atggttgaattattagttactcctaagtcaataactcatatggaaacactaatagataaa 
35 ggtgcagacgcatttgttattggtgaacaaaaatttggtttaagactgccgggagaattt 
aatcgtgatgctatgcaagaagctgtagcattagcccataaaaataacaaaaaagtatac 
gctgctgtgaatggtattttccataattaccacttagatgccttggaagactatattaac 
tttttacatgatattcaagtagatcgcattatatttggtgatccagctgtcgttatgtat 
gttaaacaacacgagcatccaattccattaaattgggatgctgaaactcttgtaacgaat 
40 tattttcagtgtaattactgggggaaaaaaggtgcaaatagagcagttttagctcgagaa 
cttagtttagatgaaataattcatattaaagagcatgctgatgtagagatagaagttcaa 
gttcatggtatgacttgtatgtttcaatccaaaagaatgctattaggaaattattatact 
ttccaagagcgacaaatgaagatagaacgccaacatgattatggagacttattattatat 
gatgaagaaagagataataaatatccagtttttgaagattataatggtactcacatcatg 
45 tctccaaatgatatctgtttaatagaagaattagagcctttttttgaggcaggaat3gat 
gcgtttaagatagatggtattttacaaagtgaagaatatataaatgtagtcacagagcaa 
tatcgagaagctatagat t tatttaatgaagatccggatgcatatgaagatgaaaaattc 
atgctcgttgatcctatagaagaaatacaacctgaacatcgtccattcgacgaaggtttc 
ttgtataaacaaacagtatattaa 

50 

Sequence 1282 

MVELLVTPKSITHMETLIDKGADAFVIGEQKFGLRLPGEFNRDAMQEAVALAHKNNKKVY 
AAVNGIFHNYHLDALEDYINFLHDIQVDRIIFGDPAVVMYVKQHEHPIPLNWDAETLVTN 
YFQCNYWGKKGANRAVLARELSLDEI IHIKEHADVEIEVQVHGMTCMFQSKRMLLGNYYT 
55 FQERQMKIERQHDYGDLLLYDEERDNKYPVFEDYNGTHIMSPNDICLIEELEPFFEAGID 
AFKIDGILQSEEYINVVTEQYREAIDLFNEDPDAYEDEKFMLVDPIEEIQPEHRPFDEGF 
LYKQTVY* 

Sequence 1283 
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Contig_0581_pos_0_7 41, 

putative peptide of unknown function 

atgaaaactttagaagaagttaaatccaaaacccaaaagataatgaaaaagcctgaacta 
cttgcaccagctggtaacttagagaaacttaagattgctgttcattatggtgcagatgca 
5 gtgt ttt t agg t ggccaagaa tatggattacgttctaat get gataatttt acta tggaa 
gaaattgctgaaggtgtagactttgctaatcgttatggcgctaaaatttatgttacaacg 
aatattattgcacatgatgaaaatatggagggactagaagagtacttacaaaaccttgaa 
tctacaggtgctactggtatcatagtagcggatcctcttatcatagaaacttgtaaaaaa 
gttgcgcccagattagaaattcatttatcaacacaacaatcactttcaaattataaagct 

10 gttgaattttggaaacaagaaggattagaccgtgttgtacttgcacgtgaaactggtgca 
atggaaatgagtgaaatgaaagaaaaagttgatattgaaatagaagcgtttatacatggc 
gcaatgtgtatcgcatattctggtcgttgtactttaagtaatcatatgacagctcgagat 
tctaatcgaggtggttgttgtcagagttgtcgttgggactatgatttacttgaagttgat 
agtgatggagaattagatttatattacgacaatagtgatgttactccttttgcaatgagt 

15 cctaaagatttaaaattaata 

Sequence 1284 

MKTLEEVKSKTQKIMKKPELLAPAGNLEKLKIAVHYGADAVFLGGQEYGLRSNADNFTME 
EIAEGVDFANRYGAKI YVTTNIIAHDENMEGLEEYLQNLESTGATGIIVADPLIIETCKK 
20 VAPRbEIHLSTQQSLSNYKAVEFWKQEGLDRWLARETGAMEMSEMKEKVDIEIEAFIHG 
AMCIAYSGRCTLSNHMTARDSNRGGCCQSCRWDYDLLEVDSDGELDLYYDNSDVTPFAMS 
PKDLKLI 

Sequence 1285 
25 Contig_0583_pos_4830_3856, 

is similar to (with p-value 0.0e+00) 

>sp:sp| P50915I HEM2_STAAU DELTA-AMINOLEVULINIC ACID DEHYDRATA 
SE (EC 4.2.1.24) (PORPHOBILINOGEN SYNTHASE) (ALAD) (ALADH) . 
>gp:gp|U89396|SAU89396_3 Staphylococcus aureus hemCDBL gene 
30 cluster: porphobilinogen deaminase (hemC), uroporphyrinogen 
III synthase (hemD) , d-aminolevulinic acid dehydratase (hemB 
) and GSA-l-aminotransferase (hemL) genes, complete cds . NID 
: g2589l80. 

atgaaatttgatagacatagaagattgcgttcatctaagacaatgcgtgatttagtaaga 

35 gaaactcatgttagaaaagaagatttaatatatccaatatttgtagttgagcaagatgat 
ataaaaagtgaaattaaatcactaccaggcatataccaaattagtttaaatttattgcat 
gaagagattaaagaggcatatgatttaggtattagagcaatcatgttcttcggtgtgcca 
aatgacaaagacgacattggatctggtgcatatgatcataatggagttgttcaagaagcg 
acacgaatatctaagaatttatataaggatt tacttattgttgcagatacttgtctttgt 

40 gaatacacagaccacggacactgtggcgttattgacgatcatacgcatgatgtagacaat 
gataaatcacttccattacttgtaaaaacagctatttctcaagttgaagctgqagctgac 
atcattgctccaagtaatatgatggatggttttgttgctgaaattcgtgaaggccttgat 
caagcgggatatcaaaatattcctatcatgagttatggtattaaatatgcatcaagcttt 
ttcggtccattcagagatgctgcagattcagcaccttcttttggggatagaaaaacctat 

45 caaatggatcctgcaaaccgtttagaggcattaagagaattggaaagtgatcttaaagaa 
ggttgcgatatgatgatagttaaaccatctttaagttatctagatattattagagatgta 
aaaaataatacgaacgtgccagtcgtagcatacaacgttagtggagaatatagtatgaca 
aaagcagcagcgttaaatggttggatagatgaagagaaaattgttatggaacaaatgata 
tctatgaaacgtgcaggtgctgatttaataattacttattttgcaaaagatatctgtcgt 

50 tatttagataaatag 

Sequence 1286 

MKFDRHRRLRSSKTMRDLVRETHVRKEDLI YPIFWEQDDIKSEIKSLPGIYQISLNLLH 
EEIKEAYDLGIRAIMFFGVPNDKDDIGSGAYDHNGWQEATRISKNLYKDLLIVADTCLC 
55 EYTDHGHCGVI DDHTHDVDNDKSLPLLVKTAISQVEAGADI IAPSNMMDGFVAEIREGLD 
QAGYQNI PIMSYGIKYASSFFGPFRDAADSAPSFGDRKTYQMDPANRLEALRELESDLKE 
GCDMMIVKPSLSYLDIIRDVKNNTNVPVVAYNVSGEYSMTKAAALNGWIDEEKIVMEQMI 
SMKRAGADLIITYFAKDICRYLDK* 
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Sequence 1287 

Contig__0583_pos_3805_254 9, 

is similar to {with p-value 0.0e+00) 

>gp:gp|U89396|SAU89396_4 Staphylococcus aureus hemCDBL gene 
5 cluster: porphobilinogen deaminase (hemC) , uroporphyrinogen 
III synthase (hemD), d-aminolevulinic acid dehydratase (hemB 
) and GSA-l-aminotransf erase (hemL) genes, complete cds . NID 
: g2589180. 

atggagcaagctgagaaattaatgcctggcggtgttaacagtcccgtaagagcatttaaa 

10 tcagtagacacaccagctatttttatggatcatggtgaaggatctaaaatatatgatatt 
gatggaaatgaatacattgattatgtgctaagttggggcccattaattctgggacataaa 
aatcaacaagttatatccaaattacatgaagcagtagataaaggtacaagcttcggcgct 
tcaacacttcaagaaaataaacttgctgaacttgtgattgaccgtgtaccttcaattgaa 
aaagtaagaatggtttcctcaggaactgaagctactttagacacacttcgtttagctagg 

15 ggtt a t acaggacgt aa taaaatt at aaaatttgaagggtgtt at catggacacagt gat 
tctttattgattaaagcaggatcaggtgttgcaacactaggtttacctgattcaccaggc 
gtccctgaaggtattgctaaaaacactatcacggtgccatataatgatttagattcactt 
aaattagcgttcgaaaaatatggcgatgatattgctggtgttattgttgaaccggttgct 
ggaaatatgggtgtagtgcctccagtgaatggatttctacaaggtttaagagatattact 

20 aatgaatatggagcattacttatatttgatgaagtgatgactggtttccgtgtaggttat 
aattgtgcgcaaggatactttggtgtaacacctgatttaacttgcttaggaaaagtgata 
ggtggaggtttacccgttggagcttttggtggtaaaaaagaaattatggattacattgct 
cctgttgggactatttatcaagctggcacactttcaggtaatcctttagcaatgactagt 
ggttatgaaacattgagtcaacttactcctgaatcttatgagtattt taattctctagga 

25 gatatacttgaaaaaggattaaaagaggtatttgctaagcataatgttccaatcacagta 
aatcgcgctggttcaatgattggttacttcttaaatgaggggcctgtaacaaattttgag 
gaagcaaataaaagtgatttaaaattatttagtaatatgtatagagaaatggctaaggaa 
ggtgttttcctaccaccttcacaatttgaaggaacatttttatcaactgcacatactaaa 
gatgatattgagaaaactatccaagcatttgataatgcattaagtcgtattgtgtga 

30 

Sequence 1288 

MEQAEKLMPGGVNSPVRAFKSVDTPAI FMDHGEGSKIYDIDGNEYIDYVLSWGPLILGHK 
NQQVISKLHEAVDKGTSFGASTLQENKLAELVIDRVPSIEKVRMVSSGTEATLDTLRLAR 
GYTGRNKIIKFEGCYHGHSDSLLIKAGSGVATLGLPDSPGVPEGIAKNTITVPYNDLDSL 
35 KLAFEKYGDDIAGVIVEPVAGNMGVVPPVNGFLQGLRDITNEYGALLIFDEVMTGFRVGY 
NCAQGYFGVTPDLTCLGKVIGGGLPVGAFGGKKEIMDYIAPVGTI YQAGTLSGNPLAMTS 
GYETLSQLTPESYEYFNSLGDILEKGLKEVFAKHNVPITVNRAGSMIGYFLNEGPVTNFE. 
EANKSDLKLFSNMYREMAKEGVFLPPSQFEGTFLSTAHTKDDIEKTIQAFDNALSRTV* 

40 Sequence 1289 

Con t i g_0 5 8 3_pos_2 1 5 2_1 1 3 3 , 

putative peptide of unknown function 

atgttaagtctgttattaaaaatgttacatgtgattttgccatttatgtttggaccaata 
ttagcggcgttattatgtgtaaaagtattaaaattaaaaatacgatggccattttggttg 

45 agtcaaattggtttaatactacttggagttcaaattggctctaccttcacacaacaagtg 
attaaagacataagtaaaaattggctaactatcgtttttgtcactatcctactaatttta 
ttagctttgataattgcattcttttttaagaaaattgcacaagtaaatttagaaactgca 
attttaagtgttataccaggtgcgctaagccaaatgttagtgatggcagaagaaaataag 
aaagcaaatatattagttgtgagtttaacacagacatcacgtgtaatatttgttgttat t 

50 ttagtaccacttatttcgtatttttttcaggataaccatcatgaaatgaatcatactaca 
atggaagtacccacactttctcagactttaaatatatggcaaataatcatcttattctca 
atggtgggaatcatctatataggaatgtcaaaaattaacttccccactaaacaattatta 
gcacctataatagttttaattatatggaatatgacaacacatttaacattttcactagat 
cattggttgttagccacagcgcaacttatttatatgatacgtattggattacagattgcc 

55 aacttaatgagtgatttaaagggaagaattgcaatagcaatagcctttcaaaatataatg 
ctcatagtcacaacgtttataatgataataggaatacatttgattactaatgaatccatc 
aatgaattgtttttaggagcagcaccaggaggtatgagtcaaatagttttagtggctatg 
gctactggagctgatgtagcgatgatttcaagctatcacatttttagaatattttttata . 
ttatttgtcattgcgccactaattggttattttattaatgttaaattaaataataaatga 
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Sequence 1290 

MLSLLLKMLHVILPFMFGPILAALLCVKVLKLKIRWPFWLSQIGLILLGVQIGSTFTQQV 
5 IKDISKNWLTIVFVTILLILLALI IAFFFKKIAQVNLETAILSVIPGALSQMLVMAEENK 
KANILVVSLTQTSRVI FWILVPLI S YFFQDNHHEMNHTTMEVPTLSQTLNIWQI I ILFS 
MVGII YIGMSKINFPTKQLLAPIIVLIIWNMTTHLTFSLDHWLLATAQLI YMIRIGLQIA 
NLMSDLKGRIAIAIAFQNIMLIVTTFIMIIGIHLITNESINELFLGAAPGGMSQIVLVAM 
ATGADVAMISS YHI FRI FFI LFVIAPLIGY FINVKLNNK* 

10 

Sequence 1291 

Contig_0584_pos_2306_3883, 

is similar to (with p-value 3.0e-72) 

>sp:sp|P47 994 |SECA_STACA PREPROTEIN TRANSLOCASE SECA SUBUNIT 
15 . >pir :pir I S47149 I S47149 secA protein - Staphylococcus carno 
sus >gp:gp 1X79725 1 SCSECA_2 S.carnosus (TM300) secA gene. NID 
: g499333. 

gtgattcaaaatgaacggcaaatgcttaagtgtttaaaaaatgtacatgttgctgcaaac 
gagtatcaaggtgaactgatatt 1 1 tgcataaagtcaaagatggcgctgtggatgatagc 

20 tatggtattcaagtggcaaaattagcggatttacctaatgaagtcat tgatagagcgcaa 
gttatattaaatgcatttgagcaaaaaccttcgtatcaactctctcatgagaatactgac 
gatcaacaaacggttccgtcgtataacgattttggtcgaacagaagaagagcaatcagtt 
atagaaacacatacatcaaatcataattatatttttgatggtgagattgtgcttatagat 
agaataactggtcgtatgctacctggaacaaagcttcagtctggtttacatcaagctata 

25 gaggctctggaaaatgttgaaatttctcaagatatgagtgtgatggcaaccataacattc 
caaaacttatttaagcaatttgatgaattttcaggtatgactggaacaggtaaattaggg 
gaaaaagaattctttgattt atattcaaaagttgttatagagattccgactcacagtccg 
attgaacgagatgatagacctgatagagtatt tgctaatggtgacaaaaagaacgatgca 
attttaaagacagtgattggtatacatgaaactcaacaacctgtgttactaattacacgt 

30 actgcagaagcggcagaatatttttcagctgagttatttaaacgtgatatacccaacaat 
ttattaatcgctcaaaatgtagctaaagaggcacaaatgattgctgaggcgggacaatta 
tctgcagttactgttgctacaagtatggcagggcgtggaactgatataaagttatcaaaa 
gaggttcatgatatcggtggcttagcagtgattattaatgaacatatggataatagccgt 
gttgatcgtcaattaagaggacgctcaggtcgccaaggagatcctggatattcacagatt 

35 tt tgtatcacttgatgatgatttagtaaaacgttggagtaactctaacttggcagaaaat 
aaaaacctccaaacgatggatgcatctaaactagaaagtagtgcactctttaaaaaacgt 
gtaaagtcaattgttaataaagcgcaacgtgtatctgaagagactgctatgaaaaataga 
gaaatggcaaatgaattcgaaaaaagtattagtgttcaacgagataaaatttatgctgaa 
cgtaatcacatacttgaagcaagcgattttgatgattttaattttgaacagcttgcacga 

40 gatgtgtttacaaaagacgttaaaaatcttgacttaagtagtgaacgtgcacttgtgaat 
tatatatacgaaaacttaagttttgtcttcgatgaagatgtatcaaatattaatatgcaa 
aatgatgaagaaatcatacaattcttaatacaacaatttactcaacaatttaacaatcgt 
ttagaagttgctgctgattcatatttaaaacttgttattgaaaatcttaataaagttaaa 
cacatcgttaaagaataa 

45 

Sequence 1292 

VIQNERQMLKCLKNVHVAANEYQGELIFLHECVKDGAVDDSYGIQVAKLADLPNEVIDRAQ 
VILNAFEQKPSYQLSHENTDDQQTVPSYNDFGRTEEEQSVIETHTSNHNYIFDGEIVLID 
RITGRMLPGTKLQSGLHQAIEALENVEISQDMSVMATITFQNLFKQFDEFSGMTGTGKLG 

50 EKEFFDLYSKVVIEI PTHSPIERDDRPDRVFANGDKKNDAILKTVIGIHETQQPVLLITR 
TAEAAEYFSAELFKRDIPNNLLIAQNVAKEAQMIAEAGQLSAVTVATSMAGRGTDIKLSK 
EVHDIGGLAVIINEHMDNSRVDRQLRGRSGRQGDPGYSQIFVSLDDDLVKRWSNSNLAEN 
KNLQTMDASKLESSALFKKRVKSIVNKAQRVSEETAMKNREMANEFEKSISVQRDKI YAE 
RNHILEASDFDDFNFEQLARDVFTKDVKNLDLSSERALVNYIYENLSFVFDEDVSNINMQ 

55 NDEEIIQFLIQQFTQQFNNRLEVAADSYLKLVIENLNKVKHIVKE* 

Sequence 1293 

Con t i g_0 584 _pos_4 8 4 4__6 1 8 1 , 

putative peptide of unknown function 
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atgaaagtcaaaagtatttcacggttcttttcaatgaagaaagtgacgctaagtttcgtt 
actttatttattggagtagggacaataggttcatacaatcagtttgctgatgcaagtacg 
aaaacgcaacaaacacatgtaactaagacatctccaactcaaaagacgacgtccaatttt 
aaacgttcagttaaagatacgtctgttaaatctagagctacatcaacaaaaagagctaca 
5 tcaaccaaacgagctatatcacccaaaacatcatcaactaaaaaaactacaatagcaaaa 
aaatctaccacagtaaataaaacgcgcacaacaaccaggactcagcctaccattcgtaag 
agttcaacaacttcaacacgttcaaaaacaatacctacttctgtgaaacgcacaacttct 
cataaagcaactactgtgtcgccaacttctaaagctaaaatatcaacaaagacacaacaa 
tcaactaaaagtcatacaacttcagttaagaaaaacactacacaactaagtaaaacaaaa 

10 tctccgtcaacgtcaacaaaatctaaaacagttcaatcctctacgacaaaggcacaacct 
actttatcgactcaagttagtacaactactaaagcaaagcaactttcaacgccaactact 
tctaaaactgatagcagtaaagctttagtaagtttagcatctacagaacgtaaaatagat 
aaataccaatcgatgactcagttagaaaaagaaacaactgaaggtgtagattggagaaaa 
gatacaaaaaacacagggaatcaagtactcattgtggctccacatggcggaagtattgaa 

15 caaggtacaacagaattaactaaagcattagcagataaaggtaattatgattattattca 
tttgaaggtattcgacctaaaaataactctgaattacatgtgacgtctacacattatgat 
gatccgacattaaatcaaatgattaaaaaccgtactgcaactatttcgattcatggcgca 
tcaggtactgaggagattatctatcttggtgggccccgttcagatttaagaaatgctata 
gagaagcaacttgtaggacgtggatttacagttaaagttccaccagagtatctaggtggt 

20 caaaataataaaaacttcattaataaagaagacaataacactggcgttcagttagaatta 
acgactgctttaagaaaagcattctttaaaaatggagatactagtacaaaaaatcgtacc 
aataaagaaaattggacaccaacaatggaagcatttattaatgcattatatgaaggtatc 
aatcaaacgtattcataa 

25 Sequence 1294 

MKVKSISRFFSMKKVTLSFVTLFIGVGTIGSYNQFADASTKTQQTHVTKTSPTQKTTSNF 
KRSVKDTSVKSRATSTKRATSTKRAISPKTSSTKKTTIAKKSTTVNKTRTTTRTQPTIRK 
SSTTSTRSKTIPTSVKRTTSHKATTVSPTSKAKISTKTQQSTKSHTTSVKKNTTQLSKTK 
SPSTSTKSKTVQSSTTKAQPTLSTQVSTTTKAKQLSTPTTSKTDSSKALVSLASTERKID 

30 KYQSMTQLEKETTEGVDWRKDTKNTGNQVLIVAPHGGSIEQGTTELTKALADKGNYDYYS 
FEGIRPKNNSELHVTSTHYDDPTLNQMIKNRTATISIHGASGTEEIIYLGGPRSDLRNAI 
EKQLVGRGFTVKVPPEYLGGQNNKNFINKEDNNTGVQLELTTALRKAFFKNGDTSTKNRT 
NKENWTPTMEAFINALYEGINQTYS+ 

35 Sequence 1295 

Contig_0584_pos_6957_9221, 

is similar to (with p-value 4.0e-31) 

>sp:sp|Pl34 85|TAGF_BACSU TEICHOIC ACID BIOSYNTHESIS PROTEIN 
F. >pir:pir I S0604 9 I S0604 9 rodC protein - Bacillus subtilis > 
40 gp:gp|X15200|BSRODC_2 Bacillus subtilis rodC operon. NID : g4 
0098. >gp:gp| Z99122 | BSUB0019_69 Bacillus subtilis complete g 
enome (section 19 of 21): from 3597091 to 3809700. NID: g263 
6029. 

gtgaaaagcttattagaaattggacatgaggtacactactttaattatcaggactataat 
45 aaaagtgatatcacaaaactaattatttacgaaggtttgagcacaaagcatcttcatatt 
catcaatttaatagtggaaaagaacttgctcatggagacctacttataattactagagaa 
accttttttaatcatgcatatctagttaaaaaattaaatagcaagattaagattgttggt 
gaaatacatggtccattggaatatattaatgagaatatagatttagcattagactgtatt 
gattgtgttcgagtgagtacagctagaattaaaaatgaatttatagctaaatatgactat 
50 catcgggtttttaatcaatacgtaaatgcacaacatatcgatttaaaatcagagccgata 
aatactaaacgaaattttttaattaaagcacgttttgaggatgaagttaaagatatttca 
tatattattaaattgtttaattacatcattaaaaaccaaattgttgatgatgcLcaactt 
tatttaataggatatggtccttcagaaatgctttacaaaaatttgataaattactatcat 
cttaatgattatattcatattaatgaaaaagaaccacagagttatatttatgtatctagt 
55 tcgccatatgaaacgctcggctattctatattagaaacgattgcacagggtaataaagct 
ctagtttactatggtgatgataacgtgttaaaggatatctatgcaccatatgaagcgata 
cgttttttaaccaaagatatgattaaagatagtaaaataattaaagactttctaaactat 
aaatatagtcactgtgatcgacaaaaagattatcgacagttgaaaagtacgtttaaatgc 
attaattatggacaggaatttttaaataatgttgaaactttctcttcatctcaacatgta 
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aaagtgaagaaaattcatcgacatctcggtagtgaaaaacaaatagatatagcaagtcgt 
ttaaaagagagtcgttggatgaatttaattagaaaaaataaatatttatttaataaatgt 
aaaacgtactatgaaaaaagaacgcatatgagctatatcaaaaatttaaatcaaatacct 
gtagacgacgattcaatatttatagagtctttccatggcaagaattttagtggagatcct 
5 aaatatattgctcttgctattaagagacagtatgatcataaaaaaatatatgtgagttca 
accaattcacttgttgatatggaaatcaaacgttacggttttacacctgttcgatttgga 
agcgagaaatatattaaaacgtttagaaagtgtaagtatgtttttatcaatggtaactcg 
tgggataaagtgtacaagtcttcagatcagatatttattcaaacatggcacggttttcca 
ttaaagaaaatggttaatgatttaaatgaacaacatgaaagacaacaacaactagaggca 

10 ttcataccacgcatgaaaaaatgggattacattttgacatcatcagatattaatacgacg 
ttgttggaatctgcttttatgctaaataaaaatccaaatcttaaagttctagaatacggc 
gcacctaagaatgaatatttaataaataataataatttacaagagcgccagcagttacag 
cttaaatatatgtataagatagatgatgataaaaaatatatattatattgtcccacttgg 
agggaaaatcaaagaaaagaagtcactcagattaatttaaaagatttacttaaatattta 

15 ccagagaattatgagattattgtgaaacttcatcctaatgaaagtcatttaagaaccaga 
tataatcaaatagataatcgaattcactgttatttcaatgaacttgttgatattcaagaa 
ctgtatattctgagtgaatgtatgattacagattactcgtcgaccatttttgactatata 
catttaaacaagccagtctttattct tcaagaagacgagcaacaatataaacaaagtgtt 
ggtttttattttgatttgtttgaagtgggtgattttcttaaagcctctttaaatgaacgc 

20 atgttagctaaacaaatttgtagcactgattatataaattattcaaaagtggttcatcgt 
ttgatgaaacaagatagttcgaaaagcagtgaaaagttaatggccgaaattcttggggaa 
ccagaatatccaagttcatcaaactgcaaacaacagatttcttaa 

Sequence 1296 

25 VKSLLEIGHEVHYFNYQDYNKSDITKLIIYEGLSTKHLHIHQFNSGKELAHGDLLIITRE 
TFFNHAYLVKKLNSKIKIVGEIHGPLEYINENIDLALDCIDCVRVSTARIKNEFIAKYDY 
HRVFNQYVNAQHI DLKSEPINTKRNFLIKARFEDEVKDISYI IKLFNYI I KNQI VDDAQL 
YLIGYGPSEMLYKNLINYYHLNDYIHINEKEPQSYI YVSSSPYETLGYSILETIAQGNKA 
LVYYGDDNVLKDI YAPYEAIRFLTKDMIKDSKIIKDFLNYKYSHCDRQKDYRQLKSTFKC 

30 INYGQEFLNNVETFSSSQHVKVKKIHRHLGSEKQIDIASRLKESRWMNLIRKNKYLFNKC 
KTYYEKRTHMSYIKNLNQI PVDDDSI FIESFHGKNFSGDPKYIALAIKRQYDHKKI YVSS 
TNSLVDMEIKRYGFTPVRFGSEKYIKTFRKCKYVFINGNSWDKVYKSSDQI FIQTWHGFP 
LKKMVNDLNEQHERQQQLEAFIPRMKKWDYILTSSDINTTLLESAFMLNKNPNLKVLEYG 
APKNEYLINNNNLQERQQLQLKYMYKIDDDKKYILYCPTWRENQRKEVTQINLKDLLKYL 

35 PENYEIIVKLHPNESHLRTRYNQIDNRIHCYFNELVDIQELYILSECMITDYSSTIFDYI 
HLNKPVFILQEDEQQYKQSVGFYFDLFEVGDFLKASLNERMLAKQICSTDYINYSKVVHR 
LMKQDSSKSSEKLMAEILGEPEYPSSSNCKQQIS* 

Sequence 1297 
40 Contig_0584_pos_13726_14 622, 

putative peptide of unknown function 

atgaagtttgcatatattcaatcgattcgtaatgagatttcaattattttaataattcta 
ttattttttgcgcttatattttatgtgttttctttaccttttgatgcatacgtactagca 
atcagtataatattactattgatgtgtgtacgttggtggataaagtatttaagttttaaa 

45 aagaatgaacatcttaaagataaagtagcatatttagaacatgagttagcacatgttaag 
aatcagcaaattgaatatcgtaacgatgttgaaagttattttttaacatgggtacatcaa 
attaaaacacctatcactgcctcacaattacttttggagagaaacgaggagaatgtagtt 
aatcgtgttcgacaagaaat tgtgcacattgataat tatacaagtctcgcat taagtt at 
ttaaaattattaaatgaagagtcagatatgacaattaccaaagtgacagttgatgatttg 

50 attcgcccgttgattttaaaatatagaattcagtttattgaacaaaagacgcaaatccat 
tatgaaaaaagtgaggacattattttaaccgatgcacaatgggcttctataatgatagag 
caacttttaaataatgctttaaaatatgctaaaggtaaagatatttggatagattttgat 
gttgccaatcaaactctacagattaaagataatggtattgggattagtaaagcagatatt 
cctaaaatttttgataaaggatactcaggatttaacggtagattgaatgaacaatcaact 

55 ggtataggtctatttatagtgcaacacattgcaaatcatttaaatatacaagtaactgta 
caatcagagttgaatcatgggacagtattttttatacattttactaaagaaaaataa 

Sequence 1298 

MKFAYIQSIRNEISIILIILLFFALIFYVFSLPFDAYVLAISIILLLMCVRWWIKYLSFK 
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KNEHLKDKVAYLEHELAHVKNQQIEYRNDVESYFLTWVHQIKTPITASQLLLERNEENVV 
NRVRQEI VHI DN YTSLALS YLKLLNEESDMTITKVTVDDLI RPLI LKYRIQFI EQKTQI H 
YEKSEDIILTDAQWASIMIEQLLNNALKYAKGKDIWIDFDVANQTLQIKDNGIGISKADI 
PKIFDKGYSGFNGRLNEQSTGIGLFIVQHIANHLNIQVTVQSELNHGTVFFIHFTKEK* 

5 

Sequence 1299 

Contig_0584_pos_14 78 9_0 , 

is similar to (with p-value 5.0e-41) 

>sp:sp|P424 23| YXDL^BACSU HYPOTHETICAL ABC TRANSPORTER ATP-BI 
10 NDING PROTEIN IN IDH 3'REGION. >gp : gp I D14399 I BACIOLO_13 Baci 
llus subtilis 15 kb chromosome segment contains the iol oper 
on. NID: g709980. >gp : gp I Z99124 | BSUB0021_68 Bacillus subtili 
s complete genome (section 21 of 21): from 3999281 to 421481 
4. NID: g2636442. >gp : gp | D4 5912 I D45912_2 Bacillus subtilis g 
15 enome sequence between the iol and hut operon, partial and c 
omplete cds . NID: gl408482. 

atggctttaaatcaaatgaaccttgaaattgatgaaaatgaatttgtagcaatcatgggc 
gagtcaggttcaggtaaatctacattactcaatttaattgctacttttgatcgtacaact 
gaaggattaataaagttagacgagttgccgcttaatcaattgaagaataaagacattgca 

20 cgctttcgcagagaaatgatgggatttgtgtttcaagattttaatgtgttgaatacgatg 
tcgaacaaagataatattttgatgcctcttgtacttgcaaatgaacgtccgaaaataatg 
caaaaacgcttaatggaaataagtgaacaattaggaattgaagacttgcttgaaaaatat 
ccgtctgaaatatctgggggacaaaaacaacggatagctatagcccgtgcgttgatagca 
cgacctaaattattat tagctgatgaacccactggtgcacttgattcaaaaacctctaaa 

25 aaccttatgtgtttatttcgaaaaattaatcaaaagcatcaaactatattaatggtgaca 
cattcaaatattgacgcgtcatatgcgaaccg 

Sequence 1300 

MALNQMNLEIDENEFVAIMGESGSGKSTLLNLIATFDRTTEGLIKLDELPLNQLKNKDIA 
30 RFRREMMGFV FQDFNVLNTMSNKDNILMPLVLANERPKIMQKRLMEISEQLGIEDLLEKY 
PSEISGGQKQRIAIARALIARPKLLLADEPTGALDSKTSKNLMCLFRKINQKHQTILMVT 
HSNIDASYANR 

Sequence 1301 
35 Contig_058 4_pos_1254 8_1107 9, 

is similar to (with p-value 0.0e+00) 

>sp:sp|P19405|PPB3_BACSU ALKALINE PHOSPHATASE III PRECURSOR 
(EC 3.1.3.1) { APASE III). >pir : pir | B39096 | B39096 alkaline ph 
osphatase (EC 3.1.3.1) III precursor - Bacillus subtilis 

40 atgaaatttatgaacaaaatgggtaagacgacgcttgcttcatcaatcgtagcagcatcc 
gttttaagtacggtaaacgtatcatatgcttcaggtagctcagaacaaagtgctcaaact 
aagcaaacacaaaacgatgccattgctttcggcaacacaaaaaatccaaaaaatgtcatc 
ttcatggttggcgatggtatgggaccttcttttaacactgcatatcgttattataaaaat 
aagcctggtgctaagaaaatgactccaactgcattcgataaatatctaaaaggaacaaat 

45 cgtacttattctaatgatcctaaagaaaatgttacagactctgctgctggaggaacagct 
tttagtaccggtcacaaaacatataacggtgcgattagtgttgatacaaataaaaaacca 
att aaatctgtgctagaacaagctaaagaacaaggaaaatcaactggtttagtaactact 
gctgaacttactgatgcaacacctgctgtatatgctgctcatgtagattcacgcgacaaa 
aaagatgaaattgcacagcaattttataatgataaaataaatggtaaacataaagtcgat 

50 gtgatgttaggtggcggtgcaaaatacttcggtaaagaaaataaaaatttagcgaaaaaa 
ttcaaaaaagatggttatgatatcgtttctaataaagatgaattaaatcaatcacaaagc 
aagcaagttttaggtactttctcagaaaaagatatgccattacaaatagatgcacctcaa 
tctaatccgttgctagtagacatgcaaaacagtgcactaaataaattaagtaaaaataat 
aaaggattctttttaatggttgagggtgcttcaattgataaagctgcccaccctaatgat 

55 atcactggtgtgatgtctgaaatgtctggtttcgaaactgcttttgataatgctattaat 
tatgcaaagacacataaagatacacttgttgtagcaactgcagaccactcaactggcggt 
ctatcaaccgcaaaaggtaaagattataaatggaatccagaggctattcacaagatgaaa 
cattctggaatgtatatgacaaaacaaatcgctgatggaaaagatcctgaaaaagtaatt 
aaagatggatacggtattgatttcccaaataaacaactcgataaagtcaaaaaagcagca 
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gacgagcttcacaaattacaaaaagaaggtaaagatgacaaagacgaaaaagttgtagaa 
caaacaacaaaattacaaaatgcaattcaaaaaccaattaacgatgcttcacacacaggt 
tggacaaccaatggccatacaggtgtagatgttaacacatatgcatatgggccaggttct 
aacaaattcaaagggaatatggaaaatacccaaagcgctaaaaacttatttgactttttc 
5 ggaaacaatgtaacatcaaatcaaaattaa 

Sequence 1302 

MKFMNKMGKTTLASSIVAASVLSTVNVSYASGSSEQSAQTKQTQNDAIAFGNTKNPKNVI 
FMVGDGMG PS FNTAYR Y Y KNK PGAKKMT PTAFDK YLKGTNRT YSND PKENVT DSAAGGTA 

10 FSTGHKTYNGAISVDTNKKPIKSVLEQAKEQGKSTGLVTTAELTDATPAVYAAHVDSRDK 
KDEIAQQFYNDKINGKHKVDVMLGGGAKYFGKENKNLAKKFKKDGYDIVSNKDELNQSQS 
KQVLGTFSEKDMPLQI DAPQSN PLLVDMQNSALNKLSKNNKG FFLMVEGAS I DKAAH PND 
I TGVMSEMSG FETAFDNAI N YAKT HKDTLVVATADH STGGLS T AKGKDYKWN PEAI HKMK 
HSGMYMTKQIADGKDPEKVIKDGYGIDFPNKQLDKVKKAADELHKLQKEGKDDKDEKVVE 

15 QTTKLQNAIQKPINDASHTGWTTNGHTGVDVNTYAYGPGSNKFKGNMENTQSAKNLFDFF 
GNNVTSNQN* 

Sequence 1303 
Contig_0584_pos_2285_1518, 
20 is similar to (with p-value 2.0e-22) 

>sp:sp|P09122|DP3X_BACSU DNA POLYMERASE XII SUBUNITS GAMMA A 
ND TAU (EC 2.7.7.7) . 

atgttatacaggatgatagatatcatcaatgatacactagtatccattaggttcagtgta 
aatcaaagtgttcattttgaagtgttgctagttaaacttgcagaaatgattaagacacag 

25 cctcaaactgtacaaaatgtagcaacagcatcggtagctaatgaaccagataatgagatg 
ttattacaacgtttagaacaacttgaaaatgagcttaaaaccttaaaagaacaagggatc 
aaaactaataaagttagtcaacaacctaagaaaccaacacgtacgattcaacgatctaaa 
aatacgttttctatgcaacaaatagcgaaagtattagacaaagcaaacaaagatgatatc 
aaattgttgaagaaccattggcaagaagtgattgatcatgcaaaaagtaatgataaaaag 

30 tctttagtaagtttgctactgaattcagaaccagtagcagctagtgaagatcatgtgtta 
gttaaatttgatgaagaaattcattgtgaaatagtaaataaagatgatgaaaagagaaac 
aatattgaaagtgtagtttgtaatatagttaataaaactgtcaaagtagttggagtgccg 
gctgaccaatggctgagagtgagagcagagtacttacaaaatcgtaacaccaatgaaaca 
catcaaagcgaaaaacaaagcacacaacagtctcaacaaatagatattgctcaaaaagct 

35 aaagacttatttggtgaggaaactgtacacttagttgatgaagactga 

Sequence 1304 

MLYRMIDIINDTLVSIRFSVNQSVHFEVLLVKLAEMIKTQPQTVQNVATASVANEPDNEM 
LLQRLEQLENELKTLKEQGIKTNKVSQQPKKPTRTIQRSKNTFSMQQIAKVLDKANKDDI 
40 KLLKNHWQEVIDHAKSNDKKSLVSLLLNSEPVAASEDHVLVKFDEEIHCEIVNKDDEKRN 
NIESVVCNIVNKTVKVVGVPADQWLRVRAEYLQNRNTNETHQSEKQSTQQSQQIDIAQKA 
KDLFGEETVHLVDED* 

Sequence 1305 
45 Cont ig_0 5 8 4_pos_l 1 05_7 2 5 , 

is similar to (with p-value 3.0e-52) 

>pir:pir IS13788 IS13788 recM protein - Bacillus subtilis 
atgcattatccagaacctatatcaaagcttatcgatagttttatgaaactgccaggcatt 
ggaccaaagacggctcaacgtctggcttttcatactttagatatgaaagaagacgatgtt 
50 gttaagtttgctaaagcactagttgatgttaaaagagaacttacctattgtagtgtttgt 
gggcatattacagaaaatgatccttgttatatatgtgaagataaacagcgagatcgttct 
gtcatatgtgtagttgaagatgacaaggatgtcatagcaatgaaaaaatgcgtgaatata 
aaggtttatatcacgtgcttcatggttcgatttcaccaatggatggtattgggcctgaag 
acatcaatatacctgcattag 

55 

Sequence 1306 

MHYPEPISKLIDSFMKLPGIGPKTAQRLAFHTLDMKEDDVVKFAKALVDVKRELTYCSVC 
GHITENDPCYICEDKQRDRSVICVVEDDKDVIAMKKCVNIKVYITCFMVRFHQWMVLGLK 
TSIYLH* 
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Sequence 1307 
Contig_0585_pos_2660_3478, 
putative peptide of unknown function 
5 atgaaaataaatgttttatgcgagaagagggacaatatggatattaaacaatcttcagag 
aaacaaggtcgaccgcatcatttatcagacagtaggacagttttaaaaagaaattttata 
ttaataccagcttatattttattacaaagtatcgtaccaatcattgttgttttcggctca 
ttagggatcactgccatgataacacaacaggcaccaccacaatggttgtatcatttttca 
ttaagtttaagttttgtgattgctcaaggtctaatattagttatcttttataaaatgcat 

10 caatctgtaataaatgatgtgatgaagcaacaatggatagttgcaaagaataaaataatt 
aaaattgtaatagttgcgattgtcgtatatttattattacttataatgcgggtgattgga 
acatcattacctaatcatttaagttatcatctcacgcaatacgaacaacgtacgctaggg 
ctatttaaatcaccatatgtgttgttagttacttttatatccatggtattcttacgtcca 
atggtagaacaaatcatttatagatatctcatcatccatgaattaggaaaagtatggaat 

15 agacaatttgtaattggtttgtctattcttattgaaacgatcgtacatgtttacgacatg 
tcatcgatttttgaaatttttccatatatcgttattgctacggcagctacaatactatat 
attaaatcgcgggataatttaattgtcgcttatatatttcaagtgattttgcagtgtatc 
ctttttatagaaattttatgtaagtataccaacttttaa 

20 Sequence 1308 

MKINVLCEKRDNMDIKQSSEKQGRPHHLSDSRTVLKRNFILIPAYILLQSIVPII VVFGS 
LGITAMITQQAPPQWLYHFSLSLSFVIAQGLILVIFYKMHQSVINDVMKQQWIVAKNKII 
KIVIVAI VVYLLLLIMRVIGTSLPNHLSYHLTQYEQRTLGLFKSPYVLLVTFISMVFLRP 
MVEQIIYRYLIIHELGKVWNRQFVIGLSILIETIVHVYDMSSIFEIFPYIVIATAATILY 

25 IKSRDNLIVAYIFQVILQCILFIEILCKYTNF* 

Sequence 1309 

Contig_0585j?os_814 8_8567, 

is similar to (with p-value 2.0e-20) 
30 >sp:sp| P36922|EBSC_ENTFA EBSC PROTEIN. >pir : pir I C4 9939 I C4 993 

9 ebsC protein - Enterococcus faecalis >gp: gp I L23802 I ENEEBSA 
3 Enterococcus faecalis pore forming, cell wall enzyme, reg 

ulatory, and dehydroquinase homologue proteins (ebsA, ebsB, eb 

sC,and ebsD) genes, complete cds with repeat region. NID: g3 
35 88106. 

atgaatacttacgaagttactgacaagcatcaacatggagaagagattgcacaactcgta 
ggtgctaaaatagaagaagtctttaaaacacttgt act agagaattccaa teat gaacac 
tatgtttttgtcattccagttaatgaaaccttagatatgaagaaggcggctcatgttgtt 
aatgaaaagaaattgaatttaatgcctctcgatcaattaaaacaagtaacagggtatgtt 
40 agaggaggatgttcacctatcggtatgaaacattcctttaaaacgacgattgatgcttcc 
gctaaaaatttagaaaaagtttatattagcggaggtcaaagaggaatgcaaattatcat t 
catgtgaatgatttaattgacatgacaaaggctcaggtagaatctattacacagaattaa 



45 Sequence 1310 

MNTYEVTDKHQHGEEIAQLVGAKIEEVFKTLVLENSNHEHYVFVIPVNETLDMKKAAI1VV 
NEKKLNLMPLDQLKQVTGYVRGGCSPIGMKHSFKTTIDASAKNLEKVYISGGQRGMQI II 
H VN DL I DMT KAQV E S I TQN * 

50 Sequence 1311 

Contig_0585_pos_8862_0, 

is similar to (with p-value 0.0e+00) 

>sp:sp(P33166|EFTU_BACSU ELONGATION FACTOR TU (EF-TU) (P-40) 
. >pir :pir ! A60663 I A60663 translation elongation factor Tu - 
55 Bacillus subtilis >gp: gp I Z99104 I BSUB0001_113 Bacillus subtil 
is complete genome (section 1 of 21): from 1 to 213080. NID: 
g2632267. >gp: gp | D64 127 | D64 127_6 Bacillus subtilis genes fo 
r RNA polymerase beta subunit, ribosomal proteins L12 and S7 
, elongation factors G and Tu and ribosomal proteins S10 and 



329 



WO 01/34809 



PCT/US00/30782 



L3, partial and complete cds . NID: gl644218. 
atggcaaaagaaaaatttgatcgctcaaaagaacatgccaatattggtactatcggtcac 
gttgaccatggtaaaacaactttaacagctgctatcgcaactgtattagctaaaaatggt 
gacactgttgcacaatcatacgatatgattgacaacgctccagaagaaaaagaacgtggt 
5 attacaatcaatactgcacatatcgaataccaaactgacaaacgtcactatgctcacgtt 
gactgcccaggacacgctgactatgttaaaaacatgatcactggtgcagctcaaatggac 
ggcggtatcttagttgtatctgctgctgacggtccaatgccacaaactcgtgaacacatc 
ttattatcacgtaacgttggtgtaccagcattagttgtattcttaaacaaagttgacatg 
gtagacgacgaagaattattagaattagttgaaatggaagttcgtgacttattaagcgaa 

10 tatgacttcccaggtgacgatgtacctgtaatcgctggttctgcattaaaagcattagaa 
ggcgatgctgaatacgaacaaaaaatcttagacttaatgcaagcagttgatgattacatt 
ccaactccagaacgtgattctgacaaaccattcatgatgccagttgaggacgtattctca 
atcactggtcgtggtactgttgctacaggccgtgttgaacgtggtcaaatcaaagttggt 
gaagaagt tgaaa teat egg tatgcacgaaacttctaaaacaactgt tactggtgtagaa 

15 atgttccgtaaattattagactacgctgaagctggtgacaacatcggtgctttattacgt 
ggtgttgcacgtgaagacgtacaacgtggtcaagtattagctgctcctggttctattaca 
ccacacacaaaattcaaagctgaagtata 

Sequence 1312 

20 MAKEKFDRSKEHANIGTIGHVDHGKTTLTAAIATVLAECNGDTVAQSYDMIDNAPEEKERG 
ITINTAHIEYQTDKRHYAHVDCPGHADYVKNMITGAAQMDGGILVVSAADGPMPQTREHI 
LLSRNVGVPALVVFLNKVDMVDDEELLELVEMEVRDLLSEYDFPGDDVPVIAGSALKALE 
GDAEYEQKILDLMQAVDDYIPTPERDSDKPF^4MPVEDVFSITGRGTVATGRVERGQIKVG 
EEVEIIGMHETSKTTVTGVEMFRKLLDYAEAGDNIGALLRGVAREDVQRGQVLAAPGSIT 

25 PHTKFKAEVX 

Sequence 1313 

Contig_0585_pos_7 928_64 80, 

is similar to (with p-value 0.0e+00) 

30 >sp: sp | P32397 | HEMG_BACSU PROTO PORPHYRINOGEN OXIDASE (EC 1.3. 
3.4) (PPO) . >pir:pir|D47045|D47045 coproporphyrinogen III ox 
idase, protoporphyrinogen IX oxidase - Bacillus subtilis >gp 
: gp | M97208 I BACHEMEHY_4 Bacillus subtilis penicillin binding 
protein 1A (ponA) gene; uroporphyrinogen decarboxylase (hemE 

35 ) gene; f errochela tase (hemH) gene complete cds, (hemY) gene 
, complete cds; OR FA, complete cds; ORFB 5' end. NID: gl4304 
1. >gp:gp|Z99109|BSUB0006_90 Bacillus subtilis complete geno 
me (section 6 of 21): from 999501 to 1209940. NID: g2633260. 
>gp : gp | Y14 083 I BSY14 083_8 Bacillus subtilis chromosomal DNA, 

40 region 76-78 degrees: between glyB-aprE. NID: g2226224. 

atggcgtattttaataaatttttaagcaattctgaaaggaagcgtggaaaagtgagtaag 
aaagtggcaattattggagcgggaatcactggtttatctagcgcatatttcattaaaaaa 
caagacccttctattgaagtaactatcttcgaagcctcaaatagagtaggtggaaagatt 
caaacatatagatcagatggttacacaattgagttaggccctgagtcttat ttaggtcgt 

45 aagacaattatgactgatgtggcaaaagatattggattagaaaatgaccttataacaaat 
actactggccaatcttatatttttgctaaaaataaattatatcctattcctggtggctca 
attatgggaattcctacagatattaaaccatttattaaaacaagactcatttcacctatt 
ggtaaattaagagcggggcttgatttgtttaaaaaaccgatagaaattgaagatgatatt 
tctgttggtagtttctttagacaacgattaggtaatgaagtattagagaacttaattgaa 

50 ccactaatgggtggtatttatggcactgatattgatcaattgagcttaatgagtacattt 
cctaactttaaggaaaaagaggaacaatttggtagtttgattaaaggaatgaaagacgaa 
aaagaacaacgtattaagaaacgtcaattatatccaggtgctcctaaaggacaattcaaa 
cagtttagacacggattgagttcttttatagaggctcttgttaaagatattgaaagtaaa 
ggtgtccacatacgatataacacgccagtcaaagatatattgatttcgcaaaaagattat 

55 gaaattttattagaagatgacagtaaagagaaatttaatggcttacttgtaacaacacca 
catcaagtatttctgaactggtttagtcacgatccagcatt tgattactttaaaaacatg 
gattctactactgtcgcaacagttgttttggcctttgatgagaaaaatattaccaatacg 
tacgatggaactggctttgttattgcaagaacaagtcaaacggatattactgcatgtact 
tggacatcaaagaaatggccatttactactccagaaggtaaagttttaattcgagcatat 
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ataggaaaaccaggtgatactgtagtagatgatcacaccgatgaagaaatagtatcaatt 
gttagaaaagacttaagccaaatgatgaccatctcagggaatcctgattttacaattgta 
aatcggttacctaagagtatgccccaataccacgtgggtcatattaaaatgattaaagaa 
attcaacaacatattaaaacaacttatcctagattacgtgttacaggggcaccgtttgaa 
5 gctgtcggtttaccagactgcatacaacaaggtaagaatgcagttgatgaaatattagaa 
gagttataa 

Sequence 1314 

MAYFNKFLSNSERKRGKVSKKVAIIGAGITGLSSAYFIKKQDPSIEVTI FEASNRVGGKI 
10 QTYRSDGYTIELGPES YLGRKTIMTDVAKDIGLENDLITNTTGQSYIFAKNKLYPIPGGS 
IMGIPTDIKPFIKTRLISPIGKLRAGLDLFKKPIEIEDDISVGSFFRQRLGNEVLENLIE 
PLMGGI YGTDIDQLSLMSTFPNFKEKEEQFGSLIKGMKDEKEQRIKKRQLYPGAPKGQFK 
QFRHGLSSFIEALVKDIESKGVHIRYNTPVKDILISQKDYEILLEDDSKEKFNGLLVTTP 
HQVFLNWFSHDPAFDYFKNMDSTTVATVVLAFDEKNITNTYDGTGFVIARTSQTDITACT 
15 WTSKKWPFTTPEGKVLIRAYIGKPGDTVVDDHTDEEIVSIVRKDLSQMMTISGNPDFTIV 
NRLPKSMPQYHVGHIKMIKEIQQHIKTTYPRLRVTGAPFEAVGLPDCIQQGKNAVDEILE 
EL* 

Sequence 1315 
20 Contig_0585_pos_634 4_5928 , 

putative peptide of unknown function 

atgacagacattataattgtacactcaaaacatggtaattctaaaaatcattggtatgaa 
tggttaaggcataatttaactttggaagggtatgatgtttctttattcaatcttgaagca 
aatgatcatgctcaaattgatgagtgggttaatgaaatgaaacaacaactacatatccgt 
25 aaaaaagatacatattttgtgacccacggatttggctcaatcgctgctttaaaatttctt 
gcagaaacgcatcatcacattgaaggtttctttagtatcgcaggatttaaagaagatgca 
caagatatagacgaagatgtagatttaaaaggggtaaccatcgattacgataaaataaaa 
gagcaagtagataaattttatggactcacgtctaaagatgatcaatatgtttcataa 

Sequence 1316 

MTDIIIVHSKHGNSKNHWYEWLRHNLTLEGYDVSLFNLEANDHAQIDEWVNEMKQQLHIR 
KKDTYFVTHGFGSIAALKFLAETHHHIEGFFSIAGFKEDAQDIDEDVDLKGVTIDYDKIK 
EQVDKFYGLTSKDDQYVS* 

Sequence 1317 
Cont i g_0 58 5_pos_2 5 5 1_97 4 , 
is similar to (with p-value 0.0e+00) 

>gp:gp|Z99107|BSUB0004_107 Bacillus subtilis complete genome 
{section 4 of 21): from 600701 to 813890. NID: g2632866. >g 
p:gp| Y15254 |BSYERABCD_4 Bacillus subtilis 13kB DNA fragment, 

from yerA to sapB gene. NID: g2577959. 
atgtcagttttaactgtcatgcaattcatagtcaatattatcatcatgattgtgttatta 
acgatt.atgattcttggggttatttggttatttaaagacaaaggtcaaaatcaacacagt 
gtactaagaaattttcctgttttgggtcgaatacgttatatttctgaaaaaatcggtccc 
gaattaagacaatatttct tcgctaacgataatgaaggtaaacctttttcacgaagtgat 
tataaaaatattgttttagctggaaaatataaatcaagaatgactagt ttcggtacaggt 
aaggattatgaagaagggttttatattcaaaatacgatgttcccacttcaagcaactgaa 
ttacatatcgatcatactgaattcatttctacatttttatatcatattgagaatgagcgc 
ctatttagtagagaagaatacagaaaaagcgctcaggttgatccgtttttcttaactgat 
gaacatgcagtagtattgggctctaaccttaagcatccctttaaaatcaaacgcttagtt 
ggtcaatctgggatgagttatggcgctttaggtaaaaatgcaattactgcactgtcaatg 
gggttagctaaagctggtacatggatgaatacaggtgaaggtggattatctgaatatcat 
ttgaaaggtaatggtgacatcatctatcaaattggtccaggactctttggggtaagagat 
catgatggcaattttaatagagacatgtttatcaatcttgccgaacacaataatgtacgc 
gcatttgaaattaagttagctcaaggtgctaaaacacgtggtggacatatggagggaaac 
aaagtcacagaagagattgcacgcattagaaatgtgaaaccatatgaaactattaattca 
cctaatcgttttgattttattaaaaatccaacagatttactgaatttcgttaatcattta 
caatcgataggtcaaaaacctgtcggcttcaaaattgttgtcagtaaagttgaagaaata 
gaggcgttagttaaaacaatggtagagatagacacctatccaagctttattactgttgat 
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ggtggtgaaggtggtacaggcgctaccttccaagagcttgaagatggtgttggtttaccg 
ttatttacagcacttcctatcgtttcaagtatgttagaaaagtatggcataagaaacaag 
gttaaaatttttgcgtccggtaaattagtgactccagataaaatcgcaattgcattagga 
ttaggtgcggatctcgtcaatattgctagaggtatgatgataagtgtaggatgcatcatg 
5 agtcaacaatgtcatttaaatacatgtccagttggagtagcaacaaccgatcctaaaaaa 
gaaaagggacttattgttgatgaaaaacaataccgtgttacaaattatgttacaagtttg 
catgaaggtttatttaacatcgctgcagctgtaggtgttcatagtccaacggagattact 
tccgaccatattatctatagacaattagatggcactacaacgtccattcaggattataaa 
cttaaattaatttcttaa 

10 

Sequence 1318 

MSVLTVMQFIVNIIIMIVLLTIMILGVIWLFKDKGQNQHSVLRNFPVLGRIRYISEKIGP 
ELRQYFFANDNEGKPFSRSDYKNIVLAGKYKSRMTSFGTGKDYEEGFYIQNTMFPLQATE 
LHIDHTEFISTFLYHIENERLFSREEYRKSAQVDPFFLTDEHAVVLGSNLKHPFKIKRLV 

15 GQSGMS YGALGKNAI TALSMGLAKAGTWMNTGEGGLSEYHLKGNGDI I YQIGPGLFGVRD 
HDGNFNRDMFINLAEHNNVRAFEIKLAQGAKTRGGHMEGNKVTEEIARIRNVKPYETINS 
PNRFDFIKNPTDLLNFVNHLQSIGQKPVGFKIVVSKVEEIEALVKTMVEIDTYPSFITVD 
GGEGGTGATFQELEDGVGLPLFTALPIVSSMLEKYGIRNKVKIFASGKLVTPDKIAIALG 
LGADLVNIARGMMISVGCIMSQQCHLNTCPVGVATTDPKKEKGLIVDEKQYRVTNYVTSL 

20 HEGLFNIAAAVGVHSPTEITSDHIIYRQLDGTTTSIQDYKLKLIS* 

Sequence 1319 

Contig_058 6_pos_4 250_3 699, 

is similar to (with p-value 2.0e-42) 

25 >sp:sp|P4 4 4 63|LIPA_HAEIN LIPOIC ACID SYNTHETASE (LIP-SYN) . > 
pir :pir | G6404 3 | G6404 3 lipoate biosynthesis protein A (lipA) 
homolog - Haemophilus influenzae (strain Rd KW20) >gp:gp|U32 
688|U32688_5 Haemophilus influenzae Rd section 3 of 163 of t 
he complete genome. NID: gl572966. 

30 gtgtatgcagaaacagtacgtaaagtaagagaaagaaatccatttacaacaatagaaatt 
ttaccatctgacatgggtggcgattatgaagcccttgaaacattaatggcttctagacca 
gacattcttaatcacaacattgaaacggttcgtcgcttaacaccaagagttcgagctcga 
gcaacttacgatagaactttacaatttttacgtcgttctaaagaattacaacctgatatt 
ccaacaaaatcaagtttgatggttgggttaggtgaaacgatggaagaaatttatgaaacg 

35 atggatgatttacgcgctaatgatgttgatatcttaactataggtcaatatttacaaccg 
tctcgaaaacatttgaaagttgagaaatattatacgccattagaatttggtaaaatgaga 
aagattgcaatggaaaaaggatttaaacattgtcaagcaggacctttagtaagaagctca 
tatcatgctgatgagcaagtgaatgaagcagctaaagagaaacaacgccaaggtgaagaa 
caactcaattaa 

40 

Sequence 1320 

VYAETVRKVRERNPFTTIEILPSDMGG DYEALETLMASRPDILNHNIETVRRLTPRVRAR 
ATYDRTLQFLRRSKELQPDIPTKSSLMVGLGETMEEIYETMDDLRANDVDILTIGQYLQP 
SRKHLKVEKYYTPLEFGKMRKIAMEKGFKHCQAGPLVRSSYHADEQVNEAAKEKQRQGEE 
45 QLN* 

Sequence 1321 

Contig_058 6_pos_3512_3123, 

putative peptide of unknown function 

50 atgattaaagtcgaccaacaatattttgaattgatagaagaatatagagaatgttttgat 
gaggaaatattttcagctaggtattcggatatattagacaaatatgattatgtcgtaggt 
gactatggttacgatcaat tacgcttaaaaggattttataaagatagtaataaaaaggca 
gaaataagtaaacgattttcaagtatacaagattatatactagaatattgtaattttggt 
tgtccttattttgtagtcagacgattgtcaccaaatgaatttattgaagaaatagatgat 

55 aaagaagatatcattgataaattacatgatgttaagattcaacctactattcaagacaca 
gaaaaacatacccaagctatagatcaatag 

Sequence 1322 

MIKVDQQYFELIEEYRECFDEEI FSARYSDIljDKYDYVVGDYGYDQLRLKGFYKDSNKKA 
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EISKRFSSIQDYILEYCNFGCPYFVVRRLSPNEFIEEIDDKEDIIDKLHDVKIQPTIQDT 
EKHTQAIDQ* 

Sequence 1323 
5 Contig_0586j?os_134 8_4 52, 

is similar to (with p-value 3.0e-63) 

>gp: gpl D86240 I D86240_l Staphylococcus aureus gene for unkown 
function and dlt operon dltA, dltB, dltC and dltD genes, com 
plete cds. NID: gl405333. 

10 atgtgggaacacgatttaacgcccatgtctagagaatcatttcttgctaacgttgaagat 
gcaacagcatgtgtgatcacattgagtgagcatattgatgaagaagtatttctcagggca 
caacaacttaaagtgattgctaatatggcagtaggttttgacaatattgatatttcatta 
gcaaagaaacatggtgtagttgttaccaatacacctcacgtactgacagagacaactgcc 
gaactagggtttacattgatgcttactgtagctcgtagaatcattgaagcgacatcatat 

15 attcaagagggtaagtggaaaagttggggaccctacttattatcaggaaaagatgtatac 
ggtgcaactgttggtatttttggaatgggtgatataggaaaagcgtttgcacgtcgctta 
caagggtttgatgcacggataatatatcacaatcgcaaacgtgacttaaatgctgaaaga 
gatttaaatgctacatatgtaacgtttaaatctttacttgaacaaagtgattttattatt 
tgcacagcacctttaactaaggaaactgagaatcaatttgatgctcgagcttttaataaa 

20 atgaaaaatgatgctgtcttcattaatattggaagaggtgaaattgtagatgaagaagca 
cttttagaagcattaaaaaatcatgagatacaagcctgtggtttagatgttacgcgtcaa 
gaacctattcaacctaatcatccaatactgaaattacctaacgctgtggtgttacctcac 
ataggaagtgcatcccaagtcactagaaatcgaatggtacaactttgtatagataatatt 
aaagcagtattaaataatgatgcaccaataaccccaataacctctttacacttttaa 

25 

Sequence 1324 

MWEHDLTPMSRESFLANVEDATACVITLSEHIDEEVFLRAQQLECVIANMAVGFDNIDISL 
AKKHGVWTNTPHVLTETTAELGFTLMLTVARRIIEATSYIQEGKWKSWGPYLLSGKDVY 
GATVGIFGMGDIGKAFARRLQGFDARII YHNRKRDLNAERDLNATYVTFKSLLEQSDFII 
30 CTAPLTKETENQFDARAFNKMKNDAVFINIGRGEIVDEEALLEALKNHEIQACGLDVTRQ 
EPIQPNHPILKLPNAVVLPHIGSASQVTRNRMVQLCIDNIKAVLNNDAPITPITSLHF* 

Sequence 1325 

Contig__0589_pos_l 181_1807, 

35 putative peptide of unknown function 

atgaattttaaaaagactgtagcaattgtcctaacgtcagcagtgttattagctggatgt 
actatagataaaaaagaaattaaaaaatatgatgatcaagtacaaaaagctatggaccaa 
gagaaaaccgttaatcaagtaagtaaaaaaataaacgaattagaagagaaaaagcaaaaa 
ttatttaaaaaggtaaatgataaagatcaaagcacacgtaaaaaagcagctgaagatata 

40 gttgaaaatgtaaaacaaagacaaaaagaatttgaaaaagaagagaaggctctagataat 
tctgaaaaagcat ttaaacaagccaagcaatatcttgaacatgtagaaaacaaagcaaag 
aaaaaagaagttgaacaacttgatagtgctattaaagaaaaatataaatcacatgatgct 
tatgcaaaggcttacaaaaaagcacttaataaggaaaaagaactgttttcttatttgaat 
gaagataatgcaacacaatcggaagtagacggaaaatcgaaagatctttctaaagcatat 

45 aaagaaatgaataataaatttaatgcttactcaaaagccattgagaaagtaaaaagagaa 
aaacaagatgtagaccaattaaaataa 

Sequence 1326 

MNFKKTVAIVLTSAVLLAGCTIDKKEI KKYDDQVQKAMDQEKTVNQVSKKINELEEKKQK 
50 LFKKVNDKDQSTRKKAAEDIVENVKQRQKEFEKEEKALDNSEKAFKQAKQYLEHVENKAK 
KKEVEQLDSAIKEKYKSHDAYAKAYKKALNKEKELFSYLNEDNATQSEVDGKSKDLSKAY 
KEMNNKFNAYSKAI EKVKREKQDVDQLK* 

Sequence 1327 
55 Contig_0589_pos_1978_3090, 

is similar to {with p-value 0.0e+00) 

>pir :pir IS10798 I DEBSPF pyruvate dehydrogenase (lipoamide) (E 
C 1.2.4.1) alpha chain - Bacillus stearothermophilus >gp:gpl 
X535 60 | BSPDMC__3 B. stearothermophilus pdhA, pdhB, pdhC, pdhD 
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genes for pyruvate dehydrogenase multienzyme complex (E.C. 
numbers 1.2.4.1, 2.3.1.12, 1.8.1.4). NID: g40038. 
atggctcctaagttacaagcccaattcgatgcagttaaagttttaaatgagactcaatcg 
aaatttgaaatggttcaaattttggatgaagacggaaatgtcgttaatgaagacttagta 
5 cctgatttaacagacgaacaattagtggaattaatggaaagaatggtatggactagaatt 
cttgatcaacgttctatttcgttaaatagacaaggacgtttaggtttctatgcaccaaca 
gcaggacaagaagcttcacaattagcatctcagtatgctttagaaagtgaagacttcatt 
ttacctggttatcgtgatgtgcctcagattatttggcatggcttacctcttacagacgca 
ttcttattctcaagaggacacttcaaaggtaaccaattccctgagggagttaatgcactt 

10 agccctcaaattattatcggtgcacaatatattcaaactgccggtgtagcgtttggactt 
aaaaaacgtggcaaaaatgcagtcgcaattacttatacaggtgatggtggttcatcacaa 
ggtgacttctatgaaggaattaactttgcatctgcatacaaagcacctgcaatttttgta 
attcaaaacaataactatgccatctctacaccacgtagtaaacaaacagctgcagaaaca 
ttagcacaaaaggctatttcagttggtatccctggaattcaagttgatggtatggatgct 

15 ttagctgtttatcaagcaacattagaagcacgtgaacgtgcagtagcaggagaaggtcct 
actgttatcgaaactttaacttatcgttatggaccacatactatggctggtgatgatcct 
actcgttatagaacttcagatgaagatgctgaatgggagaaaaaagacccattagtacgt 
ttcagaaaatatttagaagctaaaggtctttggaatgaagacaaagaaaatgaagtggtt 
gaacgtgcaaaatctgaaataaaagcagctattaaagaggctgacaatacagaaaaacaa 

20 actgttacttctctaatggatatcatgtatgaagaaatgcctcaaaatttagcagaacaa 
tatgaaatttacaaagagaaggagtcgaagtaa 

Sequence 1328 

MAPKLQAQFDAVKVLNETQSKFEMVQILDEDGNWNEDLVPDLTDEQLVELMERMVWTRI 
25 LDQRSISLNRQGRLGFYAPTAGQEASQLASQYALESEDFILPGYRDVPQIIWHGLPLTDA 
FLFSRGHFKGNQFPEGVNALSPQIIIGAQYIQTAGVAFGLKKRGKNAVAITYTGDGGSSQ 
GDFYEGINFASAYKAPAIFVIQNNNYAISTPRSKQTAAETLAQKAISVGIPGIQVDGMDA 
LAVYQATLEARERAVAGEGPTVIETLTYRYGPHTMAGDDPTRYRTSDEDAEWEKKDPLVR 
FRKYLEAKGLWNEDKENEVVERAKSEIKAAIKEADNTEKQTVTSLMDIMYEEMPQNLAEQ 
30 YEI YKEKESK* 

Sequence 1329 

Contig_0589_pos_3094_4 071, 

is similar to (with p-value 0.0e+00) 

35 >pir :pir | C36718 I C36718 pyruvate dehydrogenase (lipoamide) (E 
C 1.2.4.1) El beta chain precursor - Bacillus subtilis >gp:g 
pi AF012285 I AF012285_34 Bacillus subtilis mobA-nprE gene regi 
on. NID: g3282109. >gp : gp | M574 35 I BACPYDHY_3 B . subtilis pyruv 
ate dehydrogenase complex genes, complete cds; PAL-related 1 

40 ipoprotein (sip) gene, complete cds, lysine decarboxylase (c 
ad) gene, partial cds. NID: gl43375. >gp : gp j Z99111 I BSUB0008_ 
131 Bacillus subtilis complete genome (section 8 of 21) : fro 
m 1394791 to 1603020. NID: g2633699. 

atggcacaaatgacaatggttcaagcgattaacgatgcgcttaaaagtgaactcaaaaga 
45 gacgaagacgttt tagttttcggtgaagacgttggtgttaacggtggtgtattccgtgtt 
actgaaggtttacaaaaagaatttggcgaagatcgagtatttgatacaccattagcagag 
tctggaattggtgggcttgcactaggcttagcagtgactggcttccgtcctgttatggaa 
attcaattcttaggattcgtttatgaagtatttgacgaagtagctggtcaaattgctcgt 
actcgtttccgttcaggtggaactaaaccagcgcctgttacaattcgtacaccttttggt 
50 ggtggcgtccacactccagagttgcatgctgataatttagaaggtatcttagctcaatca 
cctggtttgaaagtagttattccatcaggtccttatgatgctaaaggattattaatttct 
tctattcaaagtaatgatccagttgtatatctagaacatatgaaattatatcgttctttc 
cgtgaagaggttcctgaagaagaatacaaaattgacattggaaaagccaatgttaaaaaa 
gaaggtaatgatattactctaatatcttacggggcaatggtacaagaatcactaaaagct 
55 gctgaagagttagaaaaagatggttattcagttgaagttattgacttacgtactgtacaa 
ccaattgatatagatactttagtagcatcagttgagaaaactggacgtgctgtagttgta 
caagaagcacaacgtcaagctggtgtgggtgcacaagtggcagcagaattagcagagcga 
gcaattctttcattagaagctccaatagctcgagtagccgcatcagatacaatttatcca 
tttactcaagctgaaaacgtttggttaccaaataaaaaagatattatagagcaagctaag 
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gcaactttagaattctaa 
Sequence 1330 

MAQMTMVQAINDALKSELKRDEDVLVFGEDVGVNGGVFRVTEGLQKEFGEDRVFDTPLAE 
5 SGIGGLALGLAVTGFRPVMEIQFLGFVYEVFDEVAGQIARTRFRSGGTKPAPVTIRTPFG 
GGVHTPELHADNLEGILAQSPGLKVVIPSGPYDAKGLLISSIQSNDPVVYLEHMKLYRSF 
REEVPEEEYKI DIGKANVKKEGNDI TLI S YGAMVQESLKAAEELEKDG YSVEVI DLRTVQ 
PIDIDTLVASVEKTGRAVVVQEAQRQAGVGAQVAAELAERAILSLEAPIARVAASDTIYP 
FTQAENVWLPNKKDI IEQAKATLEF* 

10 

Sequence 1331 

ContigJ)58 9_pos_4202_5503, 

is similar to (with p-value 0.0e+00) 

>sp:sp|Q59821|ODP2_STAAU DIHYDROLIPOAMIDE ACETYLTRANSFERASE 

15 COMPONENT (E2) OF PYRUVATE DEHYDROGENASE COMPLEX (EC 2.3.1.1 
2). >pir:pir I S19722 I S19722 dihydrolipoamide S-acetyltransf er 
ase (EC 2.3.1.12) chain E2 - Staphylococcus aureus >gp:gp|X5 
84 34 | SAPDHDNA_2 S. aureus pdhB, pdhC and pdhD genes for pyruv 
ate decarboxylase, dihydrolipoamide acetyltransf erase and di 

20 hydrolipoamide dehydrogenase. NID: g48871. 

gtggcatttgaatttagattacccgatatcggggaaggtatccacgaaggtgaaattgtt 
aaatggtttattaaagccggcgatacaattgaagaagatgatgtattagcagaagttcaa 
aatgataaatctgtagtagaaattccttctccagtaagtggtactgttgaagaagtgtta 
gtagatgaaggaacagtggcagtagtaggagatgtcatcgttaaaattgatgcacctgat 

25 gcagaagaaatgcaatttaaaggtcatggcgatgatgaggattctaagaaagaagaaaaa 
gaacaagaat caeca gtgcaagaagaagcttcatcaactcaatcacaagaaaagacagaa 
gtagatgaaagtaaaactgttaaagcgatgccgtcagtgcgtaagtatgcacgtgaaaat 
ggtgtcaatattaaagctgtaaatggttctggtaaaaatggacgaatcacaaaagaagac 
atcgatgcatacttaaatggtggtagttccgaagaaggttcaaacactagcgcagcatct 

30 gaatcaacttctagtgatgtcgttaatgcttctgcaacacaagcattaccagaaggcgac 
ttccctgaaactacagaaaaaatacctgcaatgcgcaaagcaattgctaaagcaatggtt 
aattctaaacacactgcacctcatgttacattaatggatgaaattgatgtgcaagaatta 
tgggatcaccgtaagaaatttaaagaaattgctgctgaacaaggtacaaaacttactttc 
ttaccatatgttgttaaagcattagtttctgcacttaaaaaatatccagcacttaatact 

35 tctttcaatgaagaagctggagaggttgtacacaaacattactggaatattggtattgct 
gcagatacggataaaggattattagtaccagtagttaaacatgccgatcgtaaatcaata 
ttcgaaatttctgatgaaattaatgaactagctgtaaaagcacgtgatggtaaattaact 
tcagaagaaatgaaaggtgcaacatgcacaattagtaatatcggttccgctggtggacaa 
tggttcactccagttatcaatcacccagaagtagctatcttaggaattggccgtatcgct 

40 caaaaacctatcgttaaagatggagaaattgtagctgcaccagtgttagctttatcatta 
agctttgaccatagacaaatcgatggtgctactggacaaaatgctatgaatcacattaaa 
cgcttattaaataatccagaattattattaatggaggggtaa 

Sequence 1332 

45 VAFEFRLPDIGEGIHEGEI VKWFIKAGDTIEEDDVLAEVQNDKSVVEIPSPVSGTVEEVL 
VDEGTVAVVGDVIVKIDAPDAEEMQFKGHGDDEDSKKEEKEQESPVQEEASSTQSQEKTE 
VDESKTVKAMPSVRKYARENGVNIKAVNGSGKNGRITKEDIDAYLNGGSSEEGSNTSAAS 
ESTSSDVVNASATQALPEGDFPETTEKIPAMRKAIAKAMVNSKHTAPHVTLMDEIDVQEL 
WDHRKKFKEIAAEQGTKLTFLPYVVKALVSALKKYPALNTSFNEEAGEVVHKHYWNIGIA 

50 ADTDKGLLVPVVKHADRKSI FEISDEINELAVKARDGKLTSEEMKGATCTISNIGSAGGQ 
WFTPVINHPEVAILGIGRIAQKPIVKDGEIVAAPVLALSLSFDHRQIDGATGQNAMNHIK 
RLLNNPELLLMEG* 

Sequence 1333 
55 Contig_058 9_pos_5508_0, 

is similar to (with p-value 0.0e+00) 

>pir:pir|S19723IS19723 dihydrolipoamide dehydrogenase (EC 1. 
8.1.4) - Staphylococcus aureus >gp: gp | X584 34 I SAPDHDNA_3 S . au 
reus pdhB, pdhC and pdhD genes for pyruvate decarboxylase, d 
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ihydrolipoamide acetyltransf erase and dihydrolipoamide dehyd 
rogenase. NID: g48871. 

atggtagttggagatttcccaattgaaacagatactattgtaataggagcaggtccaggt 
ggatatgtcgcagccattcgcgcggctcaattaggacaaaaggtaacaatcgttgagaaa 
5 ggtaatttaggtggtg tat get taaacgttggttgtataccttcaaaagcat tact acat 
gcttctcatcgctttgttgaagcgcaaaattcagaaaacttaggggtaattgctgaaagc 
gtttcgttaaactatcaaaaagttcaagaattcaagacttctgtagttaataaattaact 
ggcggtgttgaaggacttttaaaaggtaacaaagtagagattgttagaggtgaagcttat 
ttcgttgataacaatagtttacgtgtcatggacgaaaagagtgctcaaacttacaatttc 

10 aaacatgcgattatagctacaggttcaagaccaattgaaattccaaattttgaatttggt 
aaacgtgttatcgattcaacaggagctttaaatctacaagaagtacctaacaaactagtt 
gtagttggtggcggatatatcggttctgaattaggtactgcttttgcaaactttggctct 
gaagttactatccttgaaggtgcaaaagatattttaggcggatttgaaaagcaaatgaca 
caacctgttaaaaaaggtatgaaagaaaaaggtatcgaaatcgttactgaagcaatggca 

15 aaatctgcagaagaaactgaaaatggtgtcaaagtaacttatgaggcaaaaggtgaggaa 
caaactatcgaagctgattatgtattagttacagttggccgtcgccctaatactgatgaa 
ttaggattagaagaacttggtctgaaatttgctgatcgtggattactagaagtggacaaa 
caaagtcgtacttctattgaaaatatctttgcgattggagatattgtacctggattacca 
ttagctcacaaagctagttatgaaggtaaagttgctgctgaagcgatagatggtcaagcc 

20 gcagaggtagactatattggtatgccagcagtttgctttacagaaccagaattagcacaa 
gttggttatactgaagctcaagcaaaagaagaaggtttatcaattaaagcttctaaattc 
ccttatgcagctaatggacgagctttatcattagatgatacaaatggttttgttaagtta 
attacacttaaagaagatgatacgcttattggagcacaagttgtaggtactggcgcatct 
gatattat 

25 

Sequence 1334 

MVVGDFPIETDTIVIGAGPGGYVAAIRAAQLGQKVTIVEKGNLGGVCLNVGCIPSKALLH 
ASHRFVEAQNSENLGVIAESVSLNYQKVQEFKTSVVNKLTGGVEGLLKGNKVEIVRGEAY 
FVDNNSLRVMDEKSAQTYNFKHAIIATGSRPIEIPNFEFGKRVI DSTGALNLQEVPNKLV 
30 VVGGGYIGSELGTAFANFGSEVTILEGAKDILGGFEKQMTQPVKKGMKEKGIEIVTEAMA 
KSAEETENGVKVTYEAKGEEQTIEADYVLVTVGRRPNTDELGLEELGLKFADRGLLEVDK 
QSRTSIENIFAIGDIVPGLPLAHKASYEGKVAAEAIDGQAAEVDYIGMPAVCFTEPELAQ 
VG YTEAQAKEEGLS I KASKFPYAANGRALSLDDTNGFVKLI TLKEDDTLIGAQWGTGAS 
DIX 

35 

Sequence 1335 

Contig_0591_pos_416_94 3, 

is similar to (with p-value 6.0e-32) 

>sp:sp|Q067 52|SYC_BACSU CYSTEINYL-TRNA SYNTHETASE (EC 6.1.1. 

40 16) (CYSTEINE--TRNA LIGASE) (CYSRS) . >pir : pir I C534 02 | C534 02 
cysteine — tRNA ligase (EC 6.1.1.16) - Bacillus subtilis >gp: 
gpl D26185 I BAC180K_156 B. subtilis DNA, 180 kilobase region o 
f replication origin. NID: g467326. >gp : gp I L14 580 | BACGLUSYN_ 
6 Bacillus subtilis glutamyl-tRNA transferase (gltX) , serine 

45 acetyltransferase (cysE), and cysteinyl-tRNA synthetase (cy 
sS) genes, complete cds's. NID: g289278 . >gp:gp 1X73989 1 BSCTS 
_1 B. subtilis gene for cysteinyl-tRNA synthetase. NID: g4 993 
02. >gp:gp| Z99104 |BSUB0001_94 Bacillus subtilis complete gen 
ome (section 1 of 21): from 1 to 213080. NID: g2632267. 

50 atgattagtgtgcattatcgtagcccaataaactacaatttagaattagtaggtgcggcg 
cgaagtggtcttgaacgtatacgtaatagctacaagttaattgaggaaagagaacaaatt 
gcctcagatttggaagaacaatcagaatatatacaacaaatagataaaatactaaatcaa 
tttgaaacggtaatggatgatgactttaatactgctaatgcagtaactgcatggtatgac 
ttagctaaacttgcaaataaatatgtattagaaaatacaacttcaacaaaagt tttaaat 

55 agatttaaagaagtgtacagcatttttagtgacgtccttggtgtaccacttaagagtaaa 
gaaactgaagagttactagatgaagacattgaacaattgattgaggagcgtaatgaagca 
cgtaaaaataaagatttcgctcgagcagatgaaattagagatatgttaaaagcacgtcat 
atcattttagaagataccccccaaggtgtaagatttaaacgtggctaa 
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Sequence 1336 

MISVHYRSPINYNLELVGAARSGLERIRNSYKLIEEREQIASDLEEQSEYIQQIDKILNQ 
FETVMDDDFNTANAVTAWYDLAKLANKYVLENTTSTKVLNRFKEVY5IFSDVLGVPLKSK 
ETEELLDEDIEQLIEERNEARKNKDFARADEIRDMLKARHIILEDTPQGVRFKRG* 

5 

Sequence 1337 

Con t ig J) 5 9 l_pos_9 4 8_ 1 3 3 4 , 

putative peptide of unknown function 

atgaacgtaaaacttcttaatcctttaacattggcatatatgggtgatgcagtacttgat 
10 caacatgtgcgtgaatatatcgtgctaaaattacaaagtaaacctcctcgtttgcaccaa 
gtatcgaaaagttacgtttcagcgaaaagtcaagctaagactttagagtatttgttagat 
attgactggtttacagaggaagagctaagtgttttaaaacgaggacgtaacgctaaaagt 
tatacaaaagctaaaaatactgacat tcaaacttatcgtaaaagttcagcgttagaagct 
gttatcggatttttatatttagaccatcaatcagaacgattagaaaacttattagaaaca 
15 attgttaggatagtggatgaaaggtag 

Sequence 1338 

MNVKLLNPLTLAYMGDAVLDQHVREYIVLKLQSKPPRLHQVSKSYVSAKSQAKTLEYLLD 
IDWFTEEELSVLKRGRNAKSYTKAKNTDIQTYRKSSALEAVIGFLYLDHQSERLENLLET 
20 IVRIVDER* 

Sequence 1339 

Cont ig_0 5 9 ljpos_2 6 8 5_3 2 60 , 

putative peptide of unknown function 

25 atgaatctgaataaacagcaacatgaatatacagcactgtgtctatcgcaaacagaaaat 
aaatcttctgaagaactatttgagtctttaatagaagagctaaagccactgatttacaat 
aaaataaggtatatctcccataataagtatgacattgaagacatgtatcaagagattgtt 
attaaattctaccgtgccttgcaaaaattcgactatcaacaaggtgtaccaatagaacac 
tatatttattttttaattcgttcggttaaatatgactatcttagaaaagtaaaagcgaat 

30 tataaacgtcaacctctacttgttaatgaatacattgttgaatataacgctactttggca 
ttaaacgatatagaaagatcgataattagaaaagaattaacattagcttttaaaagaagc 
gaagtcaaactcagtcgaatggaaagacgtatcattcgat tact act. taatgattacaag 
ccaaaggagattgctatggttttaaatttggaatccaaagttgtttataatgcgattcaa 
cgtagtaaatgtaaacttaaaagaagttttgaataa 

35 

Sequence 134 0 

MNLNKQQHEYTALCLSQTENKSSEELFESLIEELKPLIYNKIRYISHNKYDIEDMYQEIV 
IKFYRALQKFDYQQGVPIEHYIYFLIRSVKYDYLRKVKANYKRQPLLVNEYIVEYNATLA 
LNDI ERSI I RKELTLAFKRSEVKLSRMERRI I RLLLNDYKPKEI AMVLNLESKVVYNAIQ 
40 RSKCKLKRSFE* 

Sequence 1341 

Contig_0591_pos_4 275_367 6, 
is similar to (with p-value 3.0e-28) 
45 >gp:gp|L14580|BACGLUSYN_7 Bacillus subtilis glutamyl-tRNA tr 
ansferase (gltX) , serine acetyltransferase (cysE), and cyste 
inyl-tRNA synthetase (cysS) genes, complete cds's. NID: g289 
278. 

gtgaatgtggaagatatagtgatagtaggtagacacgcagttaaagaagcaattatatca 
50 ggtcacgccataaataagattttgattcaagacggtataaaaaagcaacaaattaacgac 
attttaaaaaatgcaaaatcacaaaaattaattgtacaaacggtaccaaaatctaaatta 
gattttttagcaaatgcacctcaccagggtgtggctgctttagtagccccatatgaatat 
gcaaacttcgatgaatttttacaaaaacaaaagaaaaaagcccgttattcaactgttatc 
attttagatggtttagaagacccgcataatcttggctctatattaagaacagcagatgct 
55 tctggtgttgatgcggttattatacctaaaagacgatcagttgcgctaacacagaccgtt 
gcaaaagcttctacaggagcgattcagcatgttccggttataagggttactaatctttcg 
aaaactatcgacgaattaaaagacaacggcttttggattgcggggacagaagctaataat 
gcaacggattatagactaagaagaatatcactgttgatacaactattattgtatatttaa 
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Sequence 1342 

' VNVEDIVIVGRHAVKEAI ISGHAINKILIQDGIKKQQINDILKNAKSQKLIVQTVPKSKL 
DFLANAPHQGVAALVAPYEY/^NFDEFLQKQKKKARYSTVIILDGLEDPHNLGSILRTADA 
5 SGVDAVIIPKRRSVALTQTVAKASTGAIQHVPVIRVTNLSKTI DELKDNGFWIAGTEANN 
AT DYRLRRISLLIQLLLY I * 

Sequence 1343 

Contig_0592_pos_104 4 8_1014 6, 
10 is similar to (with p-value 1.0e-19) 

>gp:gp|AF011545 |AF011545_3 Bacillus subtilis SapB (sapB) , Op 
uE (opuE), YedA (yedA) genes, complete cds, and YedB (yedB) 
gene, partial cds. NID: g2465554. 

atgactaaagtaacacgtgaagaagttgaacatattgctaatttagctagacttcaaatt 
15 tctcctgaagaaacagaagaaatggctaatactttagaaagtattttagattttgcgaaa 
caaaatgatagtgccgatacagaaggtattgagccaacttatcacgtattagatttacaa 
aacgtattacgtgacgataaagcaatcgagggcattcctcaagaattagcattgaaaaat 
gcgaaagaaactgaagatggtcaatttaaagtgccatccatcatgaatggggaggacgct 
taa 

20 

Sequence 134 4 

MTKVTREEVEHIANLARLQISPEETEEMANTLESILDFAKQNDSADTEGIEPTYHVLDLQ 
NVLRDDKAIEGI PQELALKNAKETEDGQFKVPSIMNGEDA* 

25 Sequence 1345 

Contig_0592_pos_10144_8687, 

is similar to (with p-value 0.0e+00) 

>gp:gp| AF008553 |AF008553_2 Bacillus subtilis Glu-tRNAGln ami 
dotransferase subunits C (gatC) , A (gatA) and B (gatB) genes 

30 , complete cds. NID: g2589193. 

atgagtattcgttttgaatctatcgaaaaattaactgaattaatcaaaaataaagaaatt 
aaaccttctgatgtagtaaaagatatatacgcagctattgaagaaactgatccaacaatc 
aagtcattcttagctttagataaagaaaatgcaataaaaaaagccgaagaattagatgaa 
ttacaagctaaagatcaaatggatggtaaactatttggaattcctatgggaatcaaagat 

35 aatatcatcacaaaagatgtagaaactacatgtgcaagtaaaatgttagaaggatttgta 
cctatttatgaatcaactgtaatgaacaaactacatgatgaaaacgcggttttaattggt 
aaattaaacatggatgagtttgcaatgggtggctctacagagacttcatattt taagaaa 
acattaaatcctttcgatcacacagcagtaccaggaggatcttcaggtggttctgcagca 
gcggttgcagcaggtttagttccttttagtttagggtcagacactggtggttctattaga 

40 caacctgcatcttattgtggcgttgttggtatgaaaccaacttatggccgtgtatcacgt 
ttcggtttagttgcatttgcttcttctttagatcaaattggaccaatcacgcgtaatgtt 
aaagataacgcattagtacttgaggcaatttccggtgttgatgcgaatgattctacaagc 
gcacctgttgatgatgtagattttacttctgatattggtaaagatattaaaggtcttaaa 
attgcattacctaaagaatatttaggtgagggtgtaagtgaagaagttaagacttctgta 

45 aaagaagcggttgaaacgttaaaatcacttggtgctgaagttgacgaagtctcattacca 
aatacaaaatatggtattccatcatattatgttattgcgtcatcagaggcttcagcaaat 
ttagcgcgatttgatggtattagatatggatatcattctaaagaagcacaatcgttagaa 
gaattatataaaatgtctagatcggaaggctttggtgaagaagtcaaaagacgtatcttc 
ttaggtacttttgctttaagctcaggttattacgatgcatactataaaaaatctcaaaaa 

50 gttagaacgttaattaaaaatgattttgacaaagtatttgaatcttatgatgttgtggtt 
ggaccgacagcacctacaacagcatttaatattggcgaagaaattgatgatcctttaaca 
atgtatgcgaatgacttattaactacaccagttaatcttgccggtttacctggtatttca 
gttccttgtggacaatcaaacggacgcccaattggtttacaattaattggtaaacctttt 
gacgaaaaaacgttatatcgtgtcgcttatcaatttgaaacacaatacaacttacatgac 

55 gcatacgaaaatttataa 

Sequence 134 6 

MSIRFESIEKLTELIKNKEIKPSDVVKDIYAAIEETDPTIKSFLALDKENAIKKAEELDE 
LQAKDQMDGKLFGIPMGIKDNI ITKDVETTCASKMLEGFVPIYESTVMNKLHDENAVLIG 
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KLNMDEFAMGGSTETSYFKKTLNPFDHTAVPGGSSGGSAAAVAAGLVPFSLGSDTGGSIR 
QPASYCGWGMKPTYGRVSRFGLVAFASSLDQIGPITRNVKDNALVLEAISGVDANDSTS 
APVDDVDFTSDIGKDIKGLKIALPKEYLGEGVSEEVKTSVKEAVETLKSLGAEVDEVSLP 
NTKYGI PS YYVI ASSEASANLARFDGIRYGYHSKEAQSLEELYKMSRSEGFGEEVKRRI F 
5 LGTFALSSGYYDAYYKKSQKVRTLIKNDFDKVFESYDVVVGPTAPTTAFNIGEEIDDPLT 
MYANDLLTTPVNLAGLPGISVPCGQSNGRPIGLQLIGKPFDEKTLYRVAYQFETQYNLHD 
AYENL* 

Sequence 134 7 
10 Contig_0592_pos__8 683_724 7, 

is similar to (with p-value 0.0e+00) 

>sp:sp|Q45486| YZDD_BACSU PET112-LIKE PROTEIN . >gp : gp | U4 97 90 | 
BSU49790_1 Bacilus subtilis PET112-like protein gene, comple 
te cds. NID: gl354210. 

15 gtggaaatcatgcattttgaaacagtaatcggacttgaagttcatgttgagttaaaaacg 
gactcaaaaatgttctctccatcacccgcacattttggagctgaaccaaattcaaataca 
aatgttatcgacttagcttatccaggtgtattaccagtagttaatagacgtgcagtagat 
tgggcaatgagagcttcaatggcattaaatatggatattgctacaaattcaaaatttgat 
cgtaaaaactatttctatccagataatccaaaagcatatcaaatttctcagtttgatcaa 

20 cctattggagaaaatggctatattgatattgaagttgatggagaaacaaaacgtatcggt 
attacacgtcttcatatggaagaagatgcaggtaaatcaacacataaagatggttattct 
ctagtagacttaaaccgtcaaggtacgccattaattgaaattgtatctgaacccgatatt 
cgttcacctaaagaagcatatgcttatctagaaaaactacgttcaatcattcaatataca 
ggtgtatctgattgtaaaatggaagagggatccctacgttgtgatgctaatatttcactt 

25 cgtccatatggtcaaaaggaatttggtacaaaaactgaattgaaaaaccttaactcattt 
aactacgttaaaaaaggtttagaatatgaagagaaacgtcaagaagaagaattattaaat 
ggtggagagattggtcaagaaacacgtcgatttgatgaatctactggtaaaacaatttta 
atgcgtgtgaaagaaggttcagatgattatagatatttccctgaaccagatattgtacca 
ttatatgtagatgaagattggaaagcacgtgtaagagaaacaattccagaattgccagat 

30 gaacgtaaagctaaatacgtaaatgatcttggactaccagaatatgatgcgcatgtatta 
acattaactaaagaaatgtctgatttct'ttgaaggcgcaattgaccatggtgcagatgt t 
aaacttacttccaactggttaatgggaggtgttaacgagtatcttaataaaaatcaagtt 
gaattaaaagatacgcaactaacacctgaaaatttagctggtatgattaaattaatagaa 
gacggaacaatgagtagtaaaatcgctaaaaaagtttttccagaactagcagaaaatggt 

35 ggagatgctaaacaaattatggaagataaaggtttagtacaaatttctgatgaagcaaca 
ctacttaaatttgtaacagatgcattagataataatccacaatcaatagaagattataaa 
aatggtaaaggtaaagctatgggattcttagtgggccaaattatgaaagcttctaaaggt 
caagctaacccacaaaaagttaatagcctattaaaacaagaattagataaccgttaa 

40 Sequence 134 8 

VEIMHFETVIGLEVHVELKTDSKMFSPSPAHFGAEPNSNTNVIDLAYPGVLPVVNRRAVD 
WAMRASMALNMDIATNSKFDRKNYFYPDNPKAYQISQFDQPIGENGYIDIEVDGETKRIG 
ITRLHMEEDAGKSTHKDGYSLVDLNRQGTPLIEIVSEPDIRSPKEAYAYLEKLRSIIQYT 
GVSDCKMEEGSLRCDANISLRPYGQKEFGTKTELKNLNSFNYVKKGLEYEEKRQEEELLN 

45 GGEIGQETRRFDESTGKTILMRVKEGSDDYRYFPEPDIVPLYVDEDWKARVRETIPELPD 
ERKAKYVNDLGLPEYDAHVLTLTKEMSDFFEGAIDHGADVKLTSNWLMGGVNEYLNKNQV 
ELKDTQLTPENLAGMIKLIEDGTMSSKIAKKVFPELAENGGDAKQIMEDKGLVQISDEAT 
LLKFVTDALDNNPQSIEDYKNGKGKAMGFLVGQIMECASKGQANPQKVNSLLKQELDNR* 

50 Sequence 134 9 

Contig_0592_pos_7022_6072, 

is similar to (with p-value 2.0e-29) 

>sp:spl P39074 |BMRU_BACSU BMRU* PROTEIN . >gp: gp | L25604 | BACBMRU 

RBE_1 Bacillus subtilis bmrU, multidrug efflux transporter ( 
55 bmr) and its regulator (bmrR) genes, complete cds, and branc 

hed-chain 2-oxo acid dehydrogenase (bfmB) gene, 3* end. NID: 
g2558636. >gp: gp | D844 32 | BACJH642_251 Bacillus subtilis DNA, 
283 Kb region containing skin element. NID: g2627063. >gp:g 

p|Z99116|BSUB0013_lll Bacillus subtilis complete genome (sec 
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tion 13 of 21): from 2395261 to 2613730. NID: g2634723. 
atgagaaaacgtgcaagaattatatataatccaacatcaggaaaagaactttttaaacgt 
gtattaccagatgcactgattaaacttgagaaggcaggttatgaaacgagtgcatatgca 
actgaaaaaattggtgatgctacttttgaagctgaaagagcactagaaagtgaatatgat 
5 ttactcattgcagctggaggtgacggtacgttaaatgaggtggtcaacggaatcgccgaa 
caacccaatcggcctaaattaggtgtaataccaatgggcaccgttaatgactttggaaga 
gcacttcatttaccaagcgatataatgggggcgattgatgtaatcattgatggtcacaca 
accaaggtagatattggaaagatgaataatcgttatttcattaacctagctgcagggggg 
aaactaacacaagtatcttatgaaacaccaagtaagttgaaatcaattgtaggaccgttc 

10 gcgtattacattaaaggat tcgaaatgt tacctcaaatgaaagcagtagatgtacgtatc 
gaatatgatgataacatcttccaaggagaagctttactattccttttaggtttaacgaat 
tcaatggctggctttgaaaaattagttccagatgcgaagcttgacgacggttatttcacg 
ttaattattgtagaaaaagcaaatcttgctgaattgggtcatattatgacactagcgtct 
cgaggtgagcatacaaaacatcctaaagtcatttatgctaaagcgaagtctattaatatt 

15 tcatcatttactgatatgcaacttaatgttgatggtgaatacggtgggaaattacctgca 
aatttccttaatttagaacagcacatagaaatttttacacctaaagatgtatttaacgaa 
gaactattagaaaatgatacgataactgatattacgcctgataagcaataa 

Sequence 1350 

20 MRKE^ARIIYNPTSGKELFKRVLPDALIKLEKAGYETSAYATEKIGDATFEAERALESEYD 
LLIAAGGDGTLNEVVNGIAEQPNRPKLGVIPMGTVNDFGRALHLPSDIMGAIDVI IDGHT 
TKVDIGKMNNRYFINLAAGGKLTQVSYETPSKLKSIVGPFAYYIKGFEMLPQMKAVDVRI 
EYDDNI FQGEALLFLLGLTNS^4AGFEKLVPDAKLDDGYFTLI I VEKANLAELGHIMTLAS 
RGEHTKHPKVIYAKAKSINISSFTDMQLNVDGBYGGKLPANFLNLEQHIEIFTPKDVFNE 

25 ELLENDTITDITPDKQ* 

Sequence 1351 

Contig_0592_pos_6040_4 622, 
is similar to (with p-value 0.0e+00) 
30 >gp: gp| Z99108 | BSUB0005_71 Bacillus subtilis complete genome 
(section 5 of 21): from 802821 to 1011250. NID: g2633055. >g 
p:gp| D78509I D785099 Bacillus subtilis YfjG-YfjR genes, comp 
lete cds. NID: g2780390. 

atgccacacattatattcctaacctttttgtatgtgggagggaaaaaattggaaacaatt 

35 aagaaaaacgaagttaaaacgggaaaggttattgatttaactcatgagggacacggagtt 
gttaaagt tgatcgata tccaatttttattcctaatgctttaattgatgaagaaattaaa 
tttaaattaattaaagtgaaaaagaattttgctataggaaaattgatagaggtcataagt 
gaaagtgatgatagagtgacaccaccttgtatttattatgcaaagtgtggtggttgtcaa 
ttacaacatatgacatatagagcgcaattggatatgaaaagagaacaagtagttaatctt 

40 tttcatagaaaaggcccttttgagaatacggttataaaggaaactattggcatggtcaat 
ccctggcgataccgtaataaatctcaaattcctgtaggtcaaagtaactcgaatcaagtt 
ataatgggattctatagacaacgtagccatgacattatagatatggatagttgtcttata 
caagatagacaacatcaagaagtaatgaatcgagtgaagtactggctcaatgaattaaat 
atatctatatataacgaaaaaacaaaaacaggtttaatacgtcatttagtagtaaggact 

45 ggttatcataccgatgaaatgatggttatctttgttacaaatggagcaacatttaaacaa 
tcagaactattagtaaacaagctaaaaaaagaatttccaaatataacaagtattaaacag 
aatataaacaatagccattctaatgttataatggggcgtcaatcaatgactttatatggt 
aaagataaaattgaagaccaattaagtgaagtaacttatcatatttctgatttatcattt 
taccaaattaactcatcacaaactgaaaaactttatcagcaagctctgaattatgctcaa 

50 ttaacaggaaaagaaatagtattggatacgtattgtggtataggaacgattggtctatat 
atggcaccactagcaaaacatgtttatggtgtagaagttgttccgcaagccataaaagat 
gcggaagacaatgcgactaaaaaccaacttaaaaatacgactt tcgaatgtggaaaagca 
gaagatgttatcttaacatggaaatcacaagggattaaaccaggcgtagtcatggtagat 
ccacctagaaaaggatgcgatgaaactttcttaactactcttttaaaattaaatccgaaa 

55 aggattgtttatatatcatgtaacccttcaacgcaacaaagagatgcgcaaatattggct 
gaacaatacgagttagtagaaattacaccagttgatatgttcccacaaacaactcatatt 
gagactgtagcattatttgtacgtaaagacgaagaatga 

Sequence 1352 
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MPHI I FLTFLYVGGKKLETI KKNE VKTGKVT DLTHEGHGWKVDRYPI FI PNALI DEEI K 
FKLI KVKKKFAI GKLI EV I SESDDRVTP PCI Y Y AKCGGCQLQHMT YRAQLDMKREQVVNL 
FHRKGPFENTVIKETIGMVNPWRYRNKSQIPVGQSNSNQVIMGFYRQRSHDIIDMDSCLI 
QDRQHQEVMNRVKYWLNELNISIYNEKTKTGLIRHLVVRTGYHTDEMMVIFVTNGATFKQ 
5 SELLVNKLKKEFPNITSIKQNINNSHSNVIMGRQSMTLYGKDKIEDQLSEVTYHISDLSF 
YQINSSQTEKLYQQALNYAQLTGKEIVLDTYCGIGTIGLYMAPLAKHVYGVEVVPQAIKD 
AEDNATKNQLKNTTFECGKAEDVILTWKSQGIKPGVVMVDPPRKGCDETFLTTLLKLNPK 
RIVYISCNPSTQQRDAQILAEQYELVEITPVDMFPQTTHIETVALFVRKDEE* 

10 Sequence 1353 

Contig_0592_pos_4 4 8 4_3936, 

putative peptide of unknown function 

gtgaatgtgcataaaatagatttatcaggcaacaaatttcaaatccaacgatttgttctg 
ttgcaaattgtattggcgctatttacaatactatttacttataaatgggcttatcaaaca 

15 acgcatatcattgaacaaaatcttgtcatgaatcttatttttggatttgtaggtttcgca 
gtactagttattttgcacgagtttattcatcgtattttgttcattatattttctaaaggt 
gaaaaaccatctttaaaatatgataaaaacaaaattattgtacagttctctcagacttgt 
tttcatcggtggcaatttacaat tatcatgatagcaccacttgttatcataagtgcgacc 
ttactagcacttattcaaatatattccttctcatctttaatctttatgtttagtatacat 

20 acaagttattgtatgatagatgtgtttttagtagcattggcattacaaagcaaattcaaa 
tacatacaaacctatggagaaggtttgtatctttatcatcaaaagcctactcaaacctat 
tatgaataa 

Sequence 1354 

25 VNVHKIDLSGNKFQIQRFVLLQIVLALFTILFTYKWAYQTTHIIEQNLVMNLIFGFVGFA 
VLVILHEFIHRILFI IFSKGEKPSLKYDKNKIIVQFSQTCFHRWQFTIIMIAPLVIISAT 
LLALIQIYSFSSLI FMFSIHTSYCMIDVFLVALALQSKFKYIQTYGEGLYLYHQKPTQTY 
YE* 

30 Sequence 1355 

Contig_0592_pos_3018_1894, 

is similar to (with p-value 8.0e-28) 

>sp:sp| P234791SBCD_BACSU EXONUCLEASE SBCD HOMOLOG (FRAGMENT) 

35 atgaaaattgtacataccgctgattggcatctgggtaaaattttaaatggaaaacaa t tg 
cttgaagatcaaaaatatattttaactcagtttaaacaacatatggagaaagaacagcca 
gatttaatagtaattgcaggtgatttgtatgatacctcatatccaagtaaagaagcgata 
ggtttacttgaagagactattgaatacctaaatatagaacttaaaattccaataatcatg 
ataagcggtaaccatgatggtagggagagattgaattatggctctaaatggtttgagaat 

40 aatcaactttacataagaactcaactagaaaatattgatgatccaatagaattgagtggt 
gttcaatttttcactttacctttcgcaactgtgagtgaagtacaaaattattttaaggat 
aagcaaatagaaacatatcaacaagcattaaacgaatgcttagagcaaatgtctagttcc 
atagataataataaggtgaatatattaattggtcatttaactattgagggcggtaaaact 
tcagattcggaaagaccattaactattggaacagtagaatcagttgatatgcattctttt 

45 cggttgtttgattatgtaatgctcgggcacctacatcatccatttagtataaataactct 
tttatcaaatatagcggttcgattttgcaatactctttctctgaagtaaatcaatctaaa 
ggatatagagttcttgatattgaaaacaaccaactattaaatgaaaccttcgttccttta 
aaacctctaagagaactagaagttattgaaggtgattatgaggatattattcaagaaaga 
attaaagtaaaaaataaaaataattattttcattttaagttaacgaatgtttctcatatt 

50 actgatccaatgatgaaactgaaacaaatttatcccaatatattagcactatcgaatgta 
gtatttgatcatagtgagaattttagccatgttgaaatcaaaaaacaagatgatcagaca 
attatagaaaatttttataaaaatatgacagatcaacatctgagtcaagttcaatcagac 
aaaataaagcacttgttaagttttatattggatagggaggggtaa 

55 Sequence 1356 

MKI VHTADWHLGKILNGKQLLEDQKYILTQFKQHMEKEQPDLI VIAGDLYDTSYPSKEAI 
GLLEETIEYLNIELKI PI IMISGNHDGRERLNYGSKWFENNQLYIRTQLENIDDPI ELSG 
VQFFTLPFATVSEVQNYFKDKQIETYQQALNECLEQMSSSIDNNKVNILIGHLTIEGGKT 
SDSERPLTIGTVESVDMHSFRLFDYVMLGHLHHPFSINNSFIKYSGSILQYSFSEVNQSK 
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GYRVLDIENNQLLNETFVPLKPLRELEVIEGDYEDI IQERIKVKNKNNYFHFKLTNVSHI 
TDPMMKLKQIYPNILALSNVVFDHSENFSHVEIKKQDDQTIIENFYKNMTDQHLSQVQSD 
KIKHLLSFILDREG* 

5 Sequence 1357 

Cont ig_0 5 92_pos_l 8 6 9_1 1 8 , 

putative peptide of unknown function 

atggagaactttggcccttttattaaagaaactattgattttgagcaagttgaaactgat 
caactctttttaattagtggtaaaactggatctggtaaaacaatgatttttgatgctata 

10 gtatacgcattatacggtatggcttcgaccaaaactagaaaagaaggagatttaagaagt 
cattttgcagacggtaaatcgccaatgtctgtaatttatcaatttaaagttaataatcaa 
acttttaaaattcatagagaagcgccatttattaaagaggggaatataactaaaacacaa 
gccaagttaaatatatatgaattagttgataatcaatttgaattaagagaaagtaaaytg 
aatcaaggtaatcaatttatcgtacaattattaggcgttaatgctgaacaatttcgtcaa 

15 ttatttattttgcctcaaggagaatttaaaaagtttcttcagtcaaatagtaaagacaaa 
caatcgattcttagaacactttttaatagtgagcgatttgatgagattagacatctactt 
gtagaaaatgtaaagcaagaaaaagtacaaattgaaaatagatacactcaaattgaaaat 
ttatggaatgatatagatacatttaataatgatgaattggccttatataaagaattagag 
agttctcagacagataaaatgattgaaaaattcccacaatttaatgattatggatgcaaa 

20 attctcaagtcatttgaagaagctaagaataaaataactaaggaattagatgatttaaat 
cataaatataaagtgaatgttgaattaagtgagaatactaaaaaattaaaagcggaaaaa 
atcaaatttgacgatttgaaaaaagaacaaaattatattgataaattaaagcaagaatta 
aaaatgattcaggaatctaaagtattaatcacttattttactaggttacaaagtttaaaa 
aaagataaagatgaat tagtgtcacttcatgagcaatcaaaattaaacgaaacaaactat 

25 cacaatgaaattaaaggttttcaaaaacaactcgaacatttatcaacacgagaaaatgaa 
ataactcaatttaatcagtatctagaaaaaaaccaagttttcttcaatcaattagataag 
attattagtagttatcaacaaaaaccggtaattgaagaagaaataaaaagattatacagt 
gaatataatgatttaataaccaaaaaagaagaattgacgaaagaaatgaacaacaagaac 
aaagattttgcaattattgaacattacactgaagagatttataagctgaaaaagattata 

30 gatgaatctgaaagacaaaaaaaggatgagaaattatttgataaattacaactagataaa 
tcatcttatcttagcaaattaaaagagaagaaagaacagt taaat gaaattgaatcat ca 
atcaccaatatagatgcgactttaattgatttgaatgacaaaaaggattttgtaaatgaa 
ataaagtccgctatgtcaattggagatacctgtccaatttgtggtaatgaaatacattca 
ttgggagaacatattgattttgaatcaattgctcaaaaaaataataaaataaaacggtta 

35 gaaagtaagaaggtaaaaattcgtgatgaaataatcaaaatagaaactcgaattgaatct 
aacgctttacctccgcctaaaccaataacaacgtcaatgtttttttctttatattgttta 
actatacgttga 

Sequence 1358 

40 MENFGPFI KET I DFEQVETDQLFLI SGKTGSGKTMI FDAI VYALYGMASTKTRKEGDLRS 
H FADGKS PMS V I YQFKVNNQTFKI H REAP FI KEGN I TKTQAKLN I YELVDNQFELRES KV 
NQGNQFIVQLLGVNAEQFRQLFILPQGEFKKFLQSNSKDKQSILRTLFNSERFDEIRHLL 
VENVKQEKVQIENRYTQIENLWNDIDTFNNDELALYKELESSQTDKMIEKFPQFNDYGCK 
ILKSFEEAKNKITKELDDLNHKYKVNVELSENTKKLKAEKIKFDDLKKEQNYIDKLKQEL 

45 KMIQESKVLITYFTRLQSLKKDKDELVSLHEQSKLNETNYHNEIKGFQKQLEHLSTRENE 
ITQFNQYLEKNQVFFNQLDKI ISS YQQKPVI EEEI KRLYSEYNDLITKKEELTKEMNNKN 
KDFAI IEHYTEEI YKLKKI I DESERQKKDEKLFDKLQLDKSSYLSKLKEKKEQLNEIESS 
ITNIDATLIDLNDKKDFVNEIKSAMSIGDTCPICGNEIHSLGEHIDFESIAQKNNKIKRL 
ESKKVKIRDEI IKIETRIESNALPPPKPITTSMFFSLYCLTIR* 

50 

Sequence 1359 

Contig_0593_pos_954 5_10618 , 

is similar to (with p-value 0.0e+00) 

>sp : Sp | P258 1 1 | THDF_BACSU POSSIBLE THIOPHENE AND FURAN OXIDAT 
55 ION PROTEIN THDF. >pir : pir I JQ1215 | JQ1215 hypothetical 50K pr 
otein - Bacillus subtilis >gp : gpl D26185 | BAC180K_60 B . subtil 
is DNA, 180 kilobase region of replication origin. NID: g467 
326. >gp:gp|X62539 | BSORIGS_5 B. subtilis genes rpmH, rnpA, 50 
kd, gidA and gidB. NID: g40020. >gp : gp | Z99124 | BSUB0021_207 B 
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acillus subtilis complete genome (section 21 of 21): from 39 
99281 to 4214814. NID: g2636442. 

atgattacaattactttcatatttttgagaatagtgccaaaaggtaataccaacttctcc 
attattatagatttatctcaagcagaagcggttatggattttatacgttccaaaactgat 
5 cgagcttctaaggttgcgatgaatcaaatagaaggacgtttaagtgacttaatcaagaaa 
caacgtcaatccatattagagatactcgcccaagttgaagttaacattgattatccagag 
tatgatgatgtagaagacgcaacgacggacttcttactagaacagtctaagcgtattaaa 
gaagaaatcaatcagttacttgaaacaggagcacaaggtaaaataatgagagaagggtta 
tctacagttattgtaggacgtcctaatgttgggaagtcttcgatgctaaataaccttatt 

10 caagataataaagcaattgtgactgaggtcgctggtacaacaagagacgtgttagaagaa 
tatgtcaatgttagaggtgtcccgttacgacttgtagatactgcgggtattagggatact 
gaagatatcgtagagaagattggtgtagaacgttctaggaaagctttaagtgaagcagat 
ttaattttatttgtgcttaataacaatgaacctctgacggaagatgatcaaactttattc 
gaagtcattaaaaatgaggatgttattgtaatcattaataaaacagatttagaacagcga 

15 ttagatgttagcgaactaagagagatgattggtgatatgccacttatacaaacatcgatg 
cttaaacaagaaggtattgatgaattagaaatacaaattaaagatttattctttggtggc 
gaagtacaaaatcaagatatgacttatgtatctaattcacgtcacatttcattgttgaaa 
caagcgagacaatcaattcaagatgcgattgatgctgctgagtctggtatcccaatggat 
atggtacagattgatttaacacgtacttgggaaattctaggagaaattattggagaatca 

20 gcgagtgatgaattaatagatcaactatttagtcaattttgtttaggaaaataa 

Sequence 1360 

MITITFIFLRIVPKGNTNFSIIIDLSQAEAVMDFIRSKTDRASKVAMNQIEGRLSDLIKK 
QRQSILEILAQVEVNIDYPEYDDVEDATTDFLLEQSKRIKEEINQLLETGAQGKIMREGL 
25 STVIVGRPNVGKSSMLNNLIQDNKAIVTEVAGTTRDVLEEYVNVRGVPLRLVDTAGIRDT 
EDIVEKIGVERSRKALSEADLILFVLNNNEPLTEDDQTLFEVIKNEDVIVIINKTDLEQR 
LDVSELREMIGDMPLIQTSMLKQEGI DELEIQIKDLFFGGEVQNQDMTYVSNSRHI SLLK 
QARQSIQDAI DAAESGI PMDMVQIDLTRTWEI LGEI IGESASDELI DQLFSQFCLGK* 

30 Sequence 1361 

Contig_0593_pos_10691_0, 

is similar to (with p-value 0.0e+00) 

>sp:spl P25812 |GIDA_BACSU GLUCOSE INHIBITED DIVISION PROTEIN 
A. >pir : pir | JQ1216 | BWBSGA gidA protein - Bacillus subtilis > 

35 gp:gp|D26185|BAC180K_59 B. subtilis DNA, 180 kilobase region 
of replication origin. NID: g467326. >gp : gp I X6253 9 I BSORIGS_ 
6 B. subtilis genes rpmH, rnpA, 50kd, gidA and gidB. NID: g40 
020. >gp:gpl Z99124 | BSUB002 1_206 Bacillus subtilis complete g 
enome (section 21 of 21): from 3999281 to 4214814. NID: g263 

40 6442. 

gtggttcaagaatatgatgtagtagtcattggtgctggtcacgccggtattgaagcaggt 
ctagcttcagctcgccggggtgctaaaacactgatgttaacaattaatttagataatatt 
gctttcatgccatgtaatccatctgtaggtggtcctgcgaaaggaatcgttgtacgtgaa 
atagacgctttaggtggacaaatggcaaaaactattgataaaactcacattcaaatgcgt 

45 atgcttaatacaggtaaaggtccagctgttagagctttacgtgctcaagcagataaagta 
ttatatcaacaagaaatgaagcgtgtacttgagaatgaggataatttagacatcatgcaa 
ggtatggttgatgaactcattatagaagataatgaagttaaaggtgttcgtactaatat t 
ggtacagaatatcgttctaaagctgtcattattacaacaggtacattcttacgtggagaa 
attatactaggaaacttaaaatattctagtggccctaaccatcaattaccatctgtaact 

50 ctagcggataatttaagaaaattaggatttgatatcgttagatttaaaacgggtacacca 
ccacgtgtaaatgcgagaacca tcgattattctaaaactgaaatccaaccaggtgatgat 
ataggtcgagcgtttagttttgaaacaaccgaatttattttagatcaattaccttgt tgg 
ttaacttatacaaatggagatacacatcaagtcattgatgataacttacatttatctgct 
atgtattccggtatgattaaaggtacaggtcctagatattgtccatcaattgaggataaa 

55 tttgtccgctttaacga taaaccaagacatcaacttttcttagaacctgaaggacg taat 
acgaatgaggtata 

Sequence 1362 

VVQEYDVVVIGAGHAGIEAGLASARRGAKTLMLTINLDNIAFMPCNPSVGGPAKGI WRE 
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IDALGGQMAKTIDKTHIQMRMLNTGKGPAVRALRAQADKVLYQQEMKRVLENEDNLDIMQ 
GMVDELIIEDNEVKGVRTNIGTEYRSKAVIITTGTFLRGEIILGNLKYSSGPNHQLPSVT 
LADNLRKLGFDIVRFKTGTPPRVNARTIDYSKTEIQPGDDI GRAFS FETTEFILDQLPCW 
LTYTNGDTHQVIDDNLHLSAMYSGMIKGTGPRYCPSIEDKFVRFNDKPRHQLFLEPEGRN 
5 TNEVX 

Sequence 1363 

Contig_0593_pos_9014_8610, 

putative peptide of unknown function 
10 gtgtcagtaactgtaaaaggacaaactgaaacagaatggcttccagtattggattttaga 
. aacaaatctttagcaaagggtagcgcgacaacatttgatattaataaagctcaaaaacgt 

tgtttcgttaaagctgcagcattacatggcctaggtctttatatatacaacggggaagaa 

gttccaagcgctaacgacaatgacattacagaattagaagagcgtatcaaccagtttgta 

act teat ctcaagaaaaaggtagagacgcaacgctagacaaaacaatgcgttggttaggt 
15 attcaaaacattaacaaagttactaaaaaagatatagcaaatgcacatcaaaaactagat 

gcaggactaaaacaattagataaggagaattcaaatgttaaatag 

Sequence 1364 

VSVTVKGQTETEWLPVLDFRNKSLAKGSATTFDINKAQKRCFVKAAALHGLGLYI YNGEE 
20 VPSANDNDITELEERINQFVTSSQEKGRDATLDKTMRWLGIQNINKVTKKDIANAHQKLD 
AGLKQLDKENSNVK* 

Sequence 1365 
. Contig_0593_pos_8163_7489, 

25 putative peptide of unknown function 

atggtagtaataaaaaactacattacagaagatgacggtacaacaactgtagtcatcaaa 
ggagtagaactagataacaaaacatctttacttttagacaacggttacgaagtagaagca 
gatgtaagagttgtagatccattcaagattacagataagcagcgtagaaaagtatttgct 
ctctgtaacgacatagaagcttacacaggacaaccacgcgactatatgaggtatttgttc 

30 atggattacgtagaagttctctatggctatgaaaaacgtctctcattgagtgattgcaca 
agagaacaagctaaacaagttatagaagttattcttgactgggtgtttcacaacaatata 
ccacttaattataagacaagtgacttactcaaaaatgataaagcgttcctttactggtca 
acagtcaatcgtaactgtgtaatatgcggaacgccacgagcagaacttgcgcattatcac 
acagtaggtcgaggacgtaacagacgaaagatagatcacacagacaacaaagtattagcg 

35 ctatgttcaagacatcataaagagcagcaccaaataggtatagatagttttaatgagaaa 
tacaaattacatgaaagttgggtgtccgtagatgaacgactcaaccgaatgttgaaagga 
gaagtaaatggctga 

Sequence 1366 

40 MVVIKNYITEDDGTTTVVIKGVELDNKTSLLLDNGYEVEADVRVVDPFKITDKQRRKVFA 
LCNDI EAYTGQPRDYMRYLFMDYVEVLYGYEKRLSLSDCTREQAKQVIEVIliDWVFHNNI 
PLNYKTSDLLKNDKAFLYWSTVNRNCVICGTPRAELAHYHTVGRGRNRRKIDHTDNKVLA 
LCSRHHKEQHQIGI DSFNEKYKLHESWVSVDERLNRMLKGEVNG* 

45 Sequence 1367 

Contig_0593__pos_7 4 60^6702, 

putative peptide of unknown function 

atgttcgatgatagcaaaatcaagtatatagaagcactgccagaacgagatacaatcatc 
actttatgggttaagttgctgacattagctggaaagtataacgaacaaggatacattatg 

50 ttatccgaaagtctaccctataacgaagaaatgttagctaacgaatttaatagacctatc 
aattcaataagattagcgttacaaacattcgaaaagctaagcatgattgaagaagtgaat 
ggtgtctttaaagtatctaattgggaaaaacatcagaacatcgaaggtttagaaaagata 
agagaacaaaaccgtttgcgtaaacaaaagcaaagaaaaaaacaaaaacttttagatagt 
cacgtgaagtcacgtgacagtcacgcaacagatatagaagaagataaagaagtagaagaa 

55 gaaagagaaaaagaagtagataaagatatcttcaaaaactcaattaattacatcatgagt 
aaccttactcataatttaactcctaaccaaatggaacagataggatatgccattgatgat 
attggacaacatgcagatgaagttgttgaagtagctactgattatacaaaagacaaaggt 
tgtcatgcaggttacctaatcaaagtgttaaacaactgggctaaagagaacgttaagaat 
aaaaaagaggctgaaaataaaattaaacctaaaaataaaaaaactgtaacagatgatgta 
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attgctcaaatggagaaagagctaggagatgaaagttaa 
Sequence 1368 

MFDDSKIKYIEALPERDTIITLWVKLLTLAGKYNEQGYIMLSESLPYNEEMLANEFNRPI 
5 NSIRLALQTFEKLSMIEEVNGVFKVSNWEKHQNIEGLEKIREQNRLRKQKQRKKQKLLDS 
HVKSRDSHATDIEEDKEVEEEREKEVDKDIFKNSINYIMSNLTHNLTPNQMEQIGYAIDD 
IGQHADEVVEVATDYTKDKGCHAGYLIKVLNNWAKENVKNKKEAENKIKPKNKKTVTDDV 
IAQMEKELGDES* 

10 Sequence 1369 

Contig_0593_pos_6696_634 3, 

putative peptide of unknown function 

atgactaaacaacaagccctagaagtaattaagacaattagacatgtatacaacattgac 
tttgacagacctaaattagaaacatgggttaacattttgagccaaaatggggattatgaa 
15 ccgactaaaaaaacagtaatgcaatatatcaatgatgctaatccttatccacctagtatt 
ccaaacataatgagaaaagaagtcaaagtcgtaaaagaagagcctgtcgacgaaaaaact 
gctagacatcgttggagaatgaaaaatgatccagaatacgtagcacaacgtaaaaagata 
ttagacgacttcagaaagaagttaagtgagtttggagtgagtgacgatgaatga 

20 Sequence 1370 

MTKQQALEVIKTIRHVYNIDFDRPKLETWVNILSQNGDYEPTKKTVMQYINDANPYPPSI 
PNIMRKEVKVVKEEPVDEKTARHRWRMKNDPEYVAQRKKILDDFRKKLSEFGVSDDE* 

Sequence 1371 
25 Contig_0593_pos_4 059_37 00, 

putative peptide of unknown function 

atggaagcaacaaaaatgagagttaaaaataaatacttctctattacaccagatgtagta 
gagaaaatgaaagaagcagatatcaatcccgatatcttaagacaaagattagcttctggt 
tggaagtttgaagatgcaatagaagcacctattggagtaagacgtagtgagtgggatagt 
30 ttgaaacctaaagaggacgaaattgctagttataaagagagaatggagcaacgcagatta 
caagagttgaaacgtaagaaaccacatttattcacagtaaatcaaaaacactctcgtggt 
aaatggtgcacgtatcttatggagaatgacatctttcctagaaaggtggttagatcatga 

35 Sequence 1372 

MEATKMRVKNKY FS I TPDVVEKMKEADINPDI LRQRLASGWKFEDAI EAPIGVRRSEWDS 
LKPKEDEIASYKERMEQRRLQELKRKKPHLFTVNQKHSRGKWCTYLMENDIFPRKVVRS* 



40 Sequence 1373 

Contig_0593_pos_3664_3260, 

putative peptide of unknown function 

atgcatggacttaacggtgtggaagttacagcaaaggttaaaaatgtatatcgtttagtt 
cattcaagacgtggtgcggctaaatgggttgctgatgtaaaagcgattgatgggaaaact 
45 tggactattgatgataattacgatttttactcattaccagatgaaaatgaagaaaacaaa 
aagacgttatatgacaagattaaccacccgtcacattacacatatggagaaatagaagta 
attgaattcatagaacaggtcactaaagattataaaccagagttagcatttgcgattggt 
aatgcaattaaatatatcagtcgagctaatcgtaagaacggaaaagaagatttagacaaa 
gcgcgttggtatctaaacagagcattcgaaaagtgggaaaattaa 

50 

Sequence 1374 

MHGLNGVEVTAKVKNVYRLVHSRRGAAKWVADVKAIDGKTWTIDDNYDFYSLPDENEENK 
• KTLYDKINHPSHYTYGEIEVIEFIEQVTKDYKPELAFAIGNAIKYISRANRKNGKEDLDK 
ARWYLNRAFEKWEN* 

55 

Sequence 1375 

Contig_0593_pos_324 0_1561, 

putative peptide of unknown function 

atgaaaaaagtacaaaataaatattcaatcagaaagtttaacacagcagtaggtagtgtc 
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attgtagggacagcaattttctttggaggacaagcacacgctgcagaaaacgaagttcaa 
agagaacaacctaatgtagaacaagcagaaacaacacaagaagtacacacagacacacta 
caagcaagtaatgaagaagtggtacaaaacaaccaagaaaaagaaacggcccaaaatgat 
gtgtctacacaagcgacagagcaacctcgatttatcactaacacagatttcaaaacacaa 
5 acagacgaaaacggacaaacacctatttggagtaaacaacaagtcaattacgaatggaac 
gctacaggttataaaaagggtgatgaaattaactttaatcttcctgaacagttaagactt 
gctaatgaacaaaactttgacttaaacacacctgataatgtcaatattggcagagttaat 
gcgacaagagatggattagtgaatgtaagcttaactgatccgacagattacttagcgaca 
catgaaaatactaaaggttggatgtattttgagactatgttcaacagagataaagtcaaa 

10 gccggcgagagttacgacatcaaacttggtgacaagggatacacagtagacgttgcacaa 
aatgaaattaacaaaagtccgttacaaaaatggggatatgtagatgatgacaacaaagta 
cgttgggatgtaagaataaaccaggatgaacaaactattaataatggacgtttagaagat 
acattgggtgacggtttaacattcgatgaagattcattaactgtcactgaattcgatgta 
gataatcaagagttaggaagtcctttctatgattataaattaacaccaaccactaatgga 

15 tttacaatcgatttcttaaaacaaattaataaagcgtatgaaattgaatacacaacaaca 
cctttattaggtacaaatcaccaatacacaaacagtgttgaattgactggtgacggatac 
aaagaaacattagaaaatgttgaatctgaagtatctaacgctggtggaggtggtgagggt 
gacaacattccacctgtagagcctgaacaacctacagaaccagagcaacctaaggaacct 
gaaactccggaagaaccaacaacaccaaacgttccggaagaaccaaacacacctgaacaa 

20 ccgaacaatcctgaaacacctgaagaacctaacaaaccagaacaacctactaagtcagaa 
gaaccaaaacaaccacgacctgaaacaccacaaactcctgaacaatcagaagttaaagaa 
aaacaccaagaacctaaaacaccaacagagaaaaaagaaacacctattacacctcaaaaa 
ccaagtaaaattgttgaggtagaaaataaagaagaagtgtcaccaaaagaaatacaayat 
gacacgactttcgttgtaacacgatcaaaagaacaaccaaaacatattgataaaccagtt 

25 gaaagagttacgggtaacgtggctaatgaacaagaattagaaaaagagtcgaaagaagct 
gaaaaagtacaggaaaaagagcttccgaaaacaggacaagttgaaaatgtaggtgtcttt 
ggattgttagcactagtcactggtatcgcacttgtaagacgacgtaataaggaggattaa 

30 Sequence 1376 

MKKVQNKYSIRKFNTAVGSVIVGTAIFFGGQAHAAENEVQREQPNVEQAETTQEVHTDTL 
QASNEEVVQNNQEKETAQNDVSTQATEQPRFITNTDFKTQTDENGQTPIWSKQQVNYEWN 
ATGYKKGDEINFNLPEQLRLANEQNFDLNTPDNVNIGRVNATRDGLVNVSLTDPTDYLAT 
HENTKGWMYFETMFNRDKVKAGESYDIKLGDKGYTVDVAQNEINKSPLQKWGYVDDDNKV 

35 RWDVRINQDEQTINNGRLEDTLGDGLTFDEDSLTVTEFDVDNQELGSPFYDYKLTPTTNG 
FTIDFLKQINKAYEIEYTTTPLLGTNHQYTNSVELTGDGYKETLENVESEVSNAGGGGEG 
DNIPPVEPEQPTEPEQPKEPETPEEPTTPNVPEEPNTPEQPNNPETPEEPNKPEQPTKSE 
EPKQPRPETPQTPEQSEVKEKHQEPKTPTEKKETPITPQKPSKIVEVENKEEVSPKEIQD 
DTTFWTRSKEQPKHIDKPVERVTGNVANEQELEKESKEAEKVQEKELPKTGQVENVGVF 

40 GLLALVTGIALVRRRNKED* 

Sequence 1377 

Contig_0594_pos_2850_3191, 
is similar to (with p-value 5.0e-39) 
45 >gp:gp| AF044951 I AF044951_1 Staphylococcus aureus repressor p 
rotein (rzcA) and transport protein (rzcB) genes, complete c 
ds. NID: g3445565. 

atgagtggatatcatattaacgttgaaagcaatcaggatacattgattaaagtcactcat 
attttcaaagctttaagtgattttaatcgtgtgagaataatggagtttcttgaaaacggt 
50 gaagcaagtgttggacatatttcacattctttaaatatgactcaatcaaatgtatcacat 
caattgaaactacttaaaagcactcatcttgttaaatctaaaagacaagggcaatcaatg 
atttattctatagatgatatacacgtttcaactttacttaaacaagccattcaccacgcc 
aaacatcctagtgaaggtggaatttctaatgacaaatcataa 

55 Sequence 1378 

MSGYHINVESNQDTLIKVTHI FKALSDFNRVRIMEFLENGEASVGHISHSLNMTQSNVSH 
QLKLLKSTHLVKSKRQGQSMI YSIDDIHVSTLLKQAIHHAKHPSEGGISNDKS* 

Sequence 1379 
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Contig_0595_pos_1295_1999, 

is similar to (with p-value 9.0e-42) 

>gp:gp| AJ002481 |LHAJ2481_1 Lactobacillus helveticus gene enc 
oding transmembrane protein. NID: g3850046. 
5 atggcattagtattttcgctcatatcaggtgcaggatgggcatttggtcaaat tat tact 
tttaaagcgttcgaattagtaggttcatcaagagcgatgccaattactactgcatttcaa 
ttacttggtgcatctttatggggcgtttttgcgcttggcaactggcccggtataacaaac 
aa aatcattggatttctagctttactcgtaatccttataggtgcacgtatgactgtatgg 
actgaaacaaagcaacaagaatatagtaaaaatctacgaagtgcagtgatcttattactt 

10 gtaggtgaaattggctattggatatattctgctgcacctcaagcaacggatattggtgga 
tttaaagcttttttacctcaagctataggaatggtcattgtggctgtcatctatgcgttg 
atgaatatgtctaaaggtaatgcttttaaagagaaagtaagttggcaacaaacaatatcc 
ggatttttctttgcgtttgctgctttaacttatttaatt tcagcacaacctaa tatgaat 
ggtttagcaacaggatttgttctatctcaaacatctgtagtattagcaacgctaacaggc 

15 atttttttcttaaatcagaaaaaaacatcaaaagaattaatgattacaattgtgggatta 
gttcttattttagttgcagcatcaatcacagtgtttattaaataa 

Sequence 1380 

MALVFSLI SGAGWAFGQI I T FKAFELVGSS RAM P I TTAFQLLGASLWGVFALGNWPG I TN 
20 KIIGFLALLVILIGARMTVWTETKQQEYSKNLRSAVILLLVGEIGYWIYSAAPQATDIGG 
FKAFLPQAIGMVIVAVI YALMNMSKGNAFKEKVSWQQTISGFFFAFAALTYLISAQPNMN 
GLATGFVLSQTSVVLATLTGIFFLNQKKTSKELMITIVGLVLILVAASITVFIK* 

Sequence 1381 
25 ContigJ)595_pos_202 1^24 25, 

is similar to (with p-value 2.0e-24) 

>sp:Sp|P4 4734|RBSD_HAEIN HIGH AFFINITY RIBOSE TRANSPORT PROT 
EIN RBSD. >pir :pir IG64072 IG64072 high affinity ribose transp 
ort protein (rbsD) homolog - Haemophilus influenzae (strain 

30 Rd KW20) >gp:gpl U32732 | U32732_2 Haemophilus influenzae Rd se 
ction 47 of 163 of the complete genome. NID: gl573480. 
atgaagaaaacagcagtattaaatagtcacatttcaagcgcaatctccacactaggtcac 
tatgatttattaacgattaatgatgcgggtatgcctatacctaatgatgacaaacgtata 
gatttagcagtgactaagtcattgccatgtttcattgatgtgttggagacagtgttaact 

35 gaaatggaaatacaaaaaatatatttagcagaagaaattaaaactgcgaatgcacagcaa 
ttaaaagcaattaagaaattaatcaatgatgatgtagaaattaaatttattgcgcattct 
gagatgaaagaaatgttaaaatctcctttaaataaaggaaatatacgtactggtgaaa tc 
acccctttttctaacattatcctagaatctaatgtgactttttaa 

40 Sequence 1382 

MKKTAVLNSHISSAISTLGHYDLLTINDAGMPIPNDDKRIDLAVTKSLPCFIDVLETVLT 
EMEIQKI YLAEEIKTANAQQLKAIKKLINDDVEIKFIAHSEMKEMLKSPLNKGNIRTGEI 
TPFSNIILESNVTF* 

45 Sequence 1383 

Contig_0595_pos_384 6_4859, 

putative peptide of unknown function 

atggaacgattttgttgtgtaaatcaaattaactatattcaaatgaatccgttagaagcc 
aaatttaaaacgagcgctctaagatcatggaaaactgatcaggcagatgctcataagctt 

50 gcttgtttaggaccgacgcttaaacaaacagacaacttacctatacatgagttaatattc 
tttgaattaagagaacgcgtccgttttcatctagaaatcgagaatgaacaaaatcgactt 
aaatttcagatccttgaattactccatcaaacattccctggtttagaaagattatttagt 
agtcgatattcaatcattgcactcaacatcgcagaaatctttactcatccagacatggtt 
cttgatatcgacaaggaggtactgattacacatatattcaattctacagataagggaatg 

55 tcaatggataaagctacaaaatatgcacttcaattaagggtgattgctcaagaaagctat 
cctaatgtcgatagacattcctttctagtcgaaaaattacgcttacttattcaacaatta 
aaacaatctattcatcatctcaaacaattagatgatgccatgattcaattagcacaacaa 
ctcgattattttgaaaatattcattcgatacctggtattggtaagctaagcacagctatg 
attattggggagattggtgatattaagcgatttaaatcaaataaacaactcaatgctttt 
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gttggcattgatatcaaacgatatcaatcaggtcatacacactgtagagataccatcaac 
aagcgtggt aa t aaaa a agcgagaa aactt tt at tttgggtgatt a tgaat a taataaga 
gggcagcatcattatgacaatcatgtcgtcgattattactacaaactaagaaagcagcct 
aatgagaaacctcataagactgccatcattgcttgtataaatcgattattaaaaacaatt 
5 cattatcttgtaatgaatcataaattgtacgattatcaaatgtcaccacattag 

Sequence 1384 

MERFCCVNQINYIQMNPLEAKFKTSALRSWKTDQADAHKLACLGPTLKQTDNLPIHELIF 
FELRERVRFHLEIENEQNRLKFQILELLHQTFPGLERLFSSRYSI IALNIAEIFTHPDMV 
10 LDIDKEVLITHI FNSTDKGMSMDKATKYALQLRVIAQESYPNVDRHSFLVEKLRLLIQQL 
KQSIHHLKQLDDAMIQLAQQLDYFENIHSI PGIGKLSTAMIIGEIGDIKRFKSNKQLNAF 
VGIDIKRYQSGHTHCRDTINKRGNKKARKLLFWVIMNIIRGQHHYDNHVVDYYYKLRKQP 
NEKPHKTAIIACINRLLKTIHYLVMNHKLYDYQMSPH+ 

15 Sequence 1385 

Contig_0595_pos_5253_674 3, 

is similar to (with p-value O.0e+00) 

>sp:sp|P39211 IXYLBBACSU XYLULOSE KINASE {EC 2.7.1.17) (XYLD 
LOKINASE) . >gp:gp|U66480|BSU66480_19 Bacillus subtilis SpoVK 

20 (spoVK), YnbA (ynbA) , YnbB (ynbB), GlnR (glnR) , glutamine s 
ynthetase (glnA) , YnaA (ynaA), YnaB (ynaB), YnaC (ynaC), Yna 
D (ynaD), YnaE (ynaE), YnaF (ynaF), YnaG {ynaG}, YnaH (ynaH) 
, Ynal (ynal), YnaJ (ynaJ), xylan beta-1, 4-xylosidase (xynB) 
, xylose repressor (xylR), xylose isomerase (xylA) , xylulose 

25 kinase (xylB) , YncB (yncB) , YncC (yncC), YncD (yncD) and Yn 
cE (yncE) genes, complete cds . NID : gl750106. >gp:gp| Z99113 I 
BSUB0010_55 Bacillus subtilis complete genome (section 10 of 

21): from 1781201 to 2014980. NID: g2634090. 
atggtgaaagaagtag t tctaggaat tga tttaggcacaagcgcaataaaaattattgct 

30 gttgatcaactaggaaatgtcattgaatcagtaagcgaaacattaaagttataccaagag 
catcctggttatagcgaacaagaccctaatgaatggtttgaggctactaagaaagggata 
aaagaattaattcaatcaacagaaatgtcagataagatagtaaaggggatttctttttca 
ggtcaaatgcatgggttggtcatagttgatgataatggcattcctttgagaaaagcgatt 
ttatggaatgatactagaaat tcaatacaatgtagacaaattgaagatatatatggtgaa 

35 agattgaattacaatccgatattagaaggatttacacttcctaaaatgttatgggtacaa 
caacatgaacctgaaatttggaatcgagttgatgtttttatgttgcctaaagattattta 
cgttattgcttaacgcagacaattcatatggaatatagtgatgcatgtagtacattatta 
ttcaatcctgagaatta tgaatggacaaaagatgttggaga tacat ttaacattggtgat 
at ct at ccaccc ttagt aaaa tcacatt eg t a tgttggaa a tgtaacttcatcactggct 

40 aaagaattaggattatctagtgatgttgctgtatatgctgggggtggtgataatgcatgt 
ggtgcaattggtgctggtgtcatccatgataaaagtgcattatgtagcataggtacttca 
ggtgttgt at taaatgt tgaat accaacgtgtgacctcatatgatagtaatttacactta 
ttcaatcatagtgttccagatacttattacgcaatgggagtaacgttggcagcaggctat 
agtttaaactggttaaaacaaacttt ttttgaaaa tgaat cttttgaagagattttaaat 

45 ttagctgcatcttcaaagataggtgccaatggact acta tttacacctt act tagctgga 
gaacgtacgccacatggtgatgctcaaatacgtggaagttttataggtatcagtgggcaa 
catactaaagctgactttgcgagagcagtaatcgaaggcataacgtattctttatatgat 
tctataaagattatgagacgagctggtcatgaaatgaactctatcacttcaatcggtggt 
ggtgctaagagtagattttggttacaacttcaagctgatatttttaatgtgcaaataaaa 

50 agattgaagcatgaagaaggcccaagcatgggagcggcaattttagcggcatacggtcta 
ggatggtttaaaacaattgagtcttgtgtagaggcatttattaaagtagacgaggtgttt 
gagecgaataatgaaaatcat gacct ttatgaacaatactattcagt ttatgaagctat a 
tataaacaaacgaaacagcttactgctgatttgttaacgataacgaattaa 

55 Sequence 1386 

MVKEWLGIDLGTSAIKIIAVDQLGNVIESVSETLKLYQEHPGYSEQDPNEWFEATKKGI 
KELIQSTEMSDKIVKGISFSGQMHGLVIVDDNGIPLRKAILWNDTRNSIQCRQIEDI YGE 
RLNYNPILEGFTLPKMLWVQQHEPEIWNRVDVFMLPKDYLRYCLTQTIHMEYSDACSTLL 
FNPENYEWTKDVGDTFNIGDI YPPLVKSHSYVGNVTSSLAKELGLSSDVAVYAGGGDNAC 
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GAIGAGVIHDKSALCSIGTSGVVLNVEYQRVTSYDSNLHLFNHSVPDTYYAMGVTLAAGY 
SLNWLKQTFFENESFEEILNLAASSKIGANGLLFTPYLAGERTPHGDAQIRGSFIGISGQ 
HTKADFARAVIEGITYSLYDSIKIMRRAGHEMNSITSIGGGAKSRFWLQLQADIFNVQIK 
RLKHEEGPSMGAAILAAYGLGWFKTIESCVEAFIKVDEVFEPNNENHDLYEQYYSVYEAI 
5 YKQTKQLTADLLTITN* 

Sequence 1387 

Contig_0595_pos_24 00_2083, 

putative peptide of unknown function 

10 atgttagaaaaaggggtgatttcaccagtacgtatatttcctttatttaaaggagatttt 
aacatttctttcatctcagaatgcgcaataaatttaatttctacatcatcattgattaat 
ttcttaattgcttttaattgctgtgcattcgcagttttaatttcttctgctaaatatatt 
ttttgtatttccatttcagttaacactgtctccaacacatcaatgaaacatggcaatgac 
ttagtcactgctaaatctatacgtttgtcatcattaggtataggcatacccgcatcatta 

15 atcgttaataaatcatag 

Sequence 1388 

MLEKGVISPVRIFPLFKGDFNISFISECAINLISTSSLINFLIAFNCCAFAVLISSAKYI 
FCISISVNTVSNTSMKHGNDLVTAKSI RLSSLGIGIPASLIVNKS* 

20 

Sequence 1389 

Contig_0596_pos_54 65_6574, 

putative peptide of unknown function 

atggtaaacagtaatgatattgtttctattgttattagtgatattacacgtccaacgccc 

25 aaccatattcttgtacctttactaattgaggaattaaatcatgttcctcgtgagaatttc 
gtaattattaatggtacagggactcatcgagatcaaacgcgagatgaattgattcaaatg 
ttaggtgaagatattgtaaattcagtaaaaatcgttcacaatcattgctcagaaaaagaa 
agtctagctaaagtgggacacagtcaatatggatgtgatgtttatttaaacaaagcatat 
gtagaatccgattttaaaattgtaacaggttttattgaaccacactttttcgccggattt 

30 tcaggtggacctaaagggataatgcctggaattgcaggtttagaaacaattcaaacattt 
cataatgcaaaaatgattggcgatccgagatcaacgtggggaaatttagaagacaatcca 
gttcaagatatggcacgggaagttaaccgtatgtgtaaacctgactttttacttaatgtt 
gcattgaataaaagtaaagaaattactgcagcatttgctggtgaaatcttagatacacac 
aaagaaggatgcgcatatgtaaaagatcatgcaatgtttaaatgtgagcaacgctttgat 

35 attgttatcgcatcaaattctggctatcctttagatcaaaatttatatcaaacagttaaa 
gggatgagtgcagcgagtaaagttgttaaaaaagacggtcatattattatggtatctgag 
tgtgcagatggctttcctgatcatggtaagtttgccgaaattttcaaaatggcagacaca 
cctcaaggtattttagaacttattcacaatccaaactttaaggaagttgaccaatggcaa 
gtacaaaaacaagcaagtattcaaacttttgccaatgtgcatgtttattcagaacttact 

40 gaccaacaacttaaagactcgatgttaatcccaacctctaacattgaacatacaatacaa 
gaattagaacatcgatatggccgtaaattaaccattggtgttatgccacaaggtccttta 
acaataccgtacgtagaagataaagaataa 

Sequence 1390 

45 MVNSNDIVSIVISDITRPTPNHILVPLLIEELNHVPRENFVIINGTGTHRDQTRDEL1QM 
LGEDIVNSVKIVHNHCSEKESLAKVGHSQYGCDVYLNKAYVESDFKIVTGFIEPHFFAGF 
SGGPKGIMPGIAGLETIQTFHNAKMIGDPRSTWGNLEDNPVQDMAREVNRMCKPDFLLNV 
ALNKSKEITAAFAGEILDTHKEGCAYVKDHAMFKCEQRFDIVIASNSGYPLDQNLYQTVK 
GMSAASKVVKKDGHIIMVSECADGFPDHGKFAEIFKMADTPQGILELIHNPNFKEVDQWQ 

50 VQKQASIQTFANVHVYSELTDQQLKDSMLIPTSNIEHTIQELEHRYGRKLTIGVMPQGPL 
TIPYVEDKE* 

Sequence 1391 
Contig_0596_pos_6618_0, 
55 is similar to (with p-value 1.0e-17) 

>sp:sp| P7384 6I YH17_SYNY3 HYPOTHETICAL 30.2 KD PROTEIN SLR171 
7. >gp:gp|D90910{D90910_32 Synechocyst is sp. PCC6803 complet 
e genome, 12/27, 14 30419-1576592. NID: gl652956. 
atgaaattagacgcactattgaaagacatgcagagtgtagtaattgccttctcaggtgga 
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gtagatagtagcttgttactgaaaaaagcgattgatattttaggtgttaactatgttaaa 
cctgttgtagtaaaatcagaattatttagaaatgaagagtttgaactagcgcttaaactt 
ggacaaagtctaggtgttgaagtattagaaactgaaatgtctgaacttcaagatgcgaat 
atcgttaaaaatacgcctgaaagttggtactatagcaagcgcttgatgtatagtcaactt 
5 gagaatattaagaataaactaggatttaattatgtgctagatggtatgattatggatgac 
ttagatgattttcgtcccggattaaaagcaagagacgactttggtgttcgtagcgtttta 
caagaagcaaaactatctttagagcacagtggcgatgatatc 

Sequence 1392 

10 MKLDALLKDMQSVVIAFSGGVDSSLLLKKAIDILGVNYVKPWVKSELFRNEEFELALKL 
GQSLGVEVLETEMSELQDANIVKNTPESWYYSKRLMYSQLENIKNKLGFNYVLDGMTMDD 
LDDFRPGLKARDDFGVRSVLQEAKLSLEHSGDDI 

Sequence 1393 
15 Contig_0596_pos_304 3_2615, 

putative peptide of unknown function 

gtgagtagtaaattgaataaaaatattaacattcaaacccgtcaagttttgaaacagaat 
ggtgaaaagcagagatttgagtttactacaaaaggttcttggcaacaaaaatttgcagat 
tttatacgttacgaagaacaaattgaagatgctaaagttaatgttacgattaaaattgaa 
20 gatagcggtgtaaagttaattcgtaaaggcgacattaatatgaact tacatttcgtcgaa 
ggacatgagacgacaacactctatgatgtacctaccggaaaaatacctttaactgttaaa 
acactaagccttatgcatttcgttactcataatggcggtaaacttaaaatacattatgag 
ttatatcaagatgaacaaaagatgggttcttatcaatatgaaataaattataaggaga ta 
agcgaatga 

25 

Sequence 1394 

VSSKLNKNINIQTRQVLKQNGEKQRFEFTTKGSWQQKFADFIRYEEQIEDAKVNVTIKIE 
DSGVKLIRKGDINMNLHFVEGHETTTLYDVPTGKI PLTVKTLSLMH FVTHNGGKLKIH YE 
LYQDEQKMGSYQYEINYKEISE* 

30 

Sequence 1395 

Contig_0596_pos_1862_957, 

is similar to (with p-value 0.0e+00) 

> 9P : 9Pl Z99123 | BSUB0020_30 Bacillus subtilis complete genome 
35 (section 20 of 21): from 3798401 to 4010550. NID: g2636240. 
>gp:gpl Z97024 |BSZ97024_4 Bacillus subtilis ywiA, sbo, ywiB, 
argS and narK genes. NID: g2224752. 

gtgcactttgataattggtttagcgaaacatctttatatgaaaatggcgcgattaaaaat 
acattatctaaaatgaaagagctaggctatacgtatgaagcagatggcgcgacttggtta 

40 cgtacaagtgactttaaagacgataaagatcgtgtattaataaaaaaagacggtaattac 
acttatttcacaccagatacggcctatcactacaataagattaatagaggaaatgatatt 
ttaattgatttaatgggtgctgatcaccacggttatattaaccgtctcaaagctagctta 
gaaacatttggtgtagacagcgatcgtcttgaaattcaaattatgcaaatggttcgcctt 
atgcaaaacggagaagaagtgaagatgagtaagcgtacaggcaatgcgattactttacgt 

45 gaaattatggatgaagtcggtatcgatgctgcacgttatttcttaacaatgcgtagtcct 
gattctcattttgactttgatttagagttagctaaagagcaatctcaagacaatccaatt 
tattacgcacaatatgcacatgcacgaatttgttctattctaaagcaagctaaagaacaa 
ggtattgaagtgtctactgatgcagact tttctaaaataaataatgataaagcaatagac 
ttattaaaaaaagtagcagaattcgaatcgacgatagagagtgcagccgaacatcgtgca 

50 cctcatcgtctaactaattatattcaggatctagctgcagcattccataaattttataat 
gctgaaaaagtgcttacagatgatacggaaaaaactaaagcacatgtagctatgattgaa 
gcggtgcgtattaccttgcacaatgcattagcattagtaggtgttacagcaccagaatct 
atgtaa 

55 Sequence 1396 

VHFDNWFSETSLYENGAIKNTLSKMKELGYTYEADGATWLRTSDFKDDKDRVLIKKDGNY 
TYFTPDTAYHYNKINRGNDILI DLMGADHHGYINRLKASLETFGVDSDRLEIQIMQMVRL 
MQNGEEVKMSKRTGNAITLREIMDEVG I DAARYFLTMRSPDSHFDFDLELAKEQSQDNPI 
YYAQYAHARICS ILKQAKEQGIEVSTDADFSKINNDKAIDLLKKVAEFESTIESAAEHRA 
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PHRLTNYIQDLAAAFHKFYNAEKVLTDDTEKTKAHVT^MIEAVRITLHNALALVGVTAPES 
M* . 

Sequence 1397 
5 Con t ig_05 9 6_pos_7 4 5_65 , 

putative peptide of unknown function 

atgcgagatataacaatcatgcgaagatttgtagggggagaaatcatgttatcaatagaa 
aaattatatcaaattctatatcaaaatatgggccctcaatattggtgigccagcagaaacg 
ccaatagaaatgatgcttggggcaattctagtccaaaatactaattggaacaatgcagat 

10 atagcgttatcaagattaaaagaagaaacttcttttaatgcacagacgatattgaaaatg 
cctttagaatcgttgcagcaagtgatacgttcgagtggtttictataaaaataaagctaag 
gctatacaggcattgttactatggttaaatcaacatcattttgattatagtagtatagct 
aagt tat acggt gat agcttaagaaaagaat tact caeca tccgtggt at aggtgaagag 
accgccgatgtcttaatagtatatatttttaaaggtaaagaattcatacctgatagttat 

15 actagacgtatttttagaaaattgggatatcaacatacagaaagttatcataaattgaaa 
caggaattaacacttcctgaatcattttcaaatcaagatgcaaatgagtttcacgcttta 
ttagataattttgggaaaaattattttaatggtaaggggaaacaacgctataccttttta 
gatacctattttaaaaaataa 

20 Sequence 1398 

MRDITIMRRFVGGEIMLSIEKLYQILYQNMGPQYWWPAETPIEMMLGAILVQNTNWNNAD 
IALSRLKEETSFNAQTILKMPLESLQQVIRSSGFYKNKAKAIQALLLWLNQHHFDYSSIA 
KLYGDSLRKELLTIRGIGEETADVLIVYIFKGKEFI PDSYTRRIFRKLGYQHTESYHKLK 
QELTLPESFSNQDANEFHALLDNFGKNYFNGKGKQRYTFLDTYFKK+ 

25 

Sequence 1399 

Cont ig_05 98_pos_7 65 9_9 62 6 , 

is similar to (with p-value 6.0e-34) 

>sp:sp|P37710|ALYS_ENTFA AUTOLYSIN (EC 3.5.1.28) (N-ACETYLMU 
30 RAMOYL-L- ALANINE AMIDASE) . >pir : pir I A38109 I A38109 autolysin 
- Enterococcus faecalis >gp : gp | M58002 | STRHYDROLA_l Streptoco 
ecus faecalis bacterial cell wall hydrolase gene, complete c 
ds. MID: gl53658. 

atgaagaaaaataaatttttagtatatttactatcgacggcgcttatcacgccaaccttc 

35 gctacacaaacagcttttgctgaagattcatctaataaaaatacaaattcagataaaatg 
gaacaacatcaatcacaaaaagaaacatcaaaacaatctgaaaaagatgaatttaacaac 
gatgattctaaacacgattctgatgataaaaaaagcacttctgacagcaaggacaaagac 
tctaataaaccattatcagctgactcaacacatcgtaactataaaatgaaagatgataat 
ttagttgatcaactttatgataattttaagtctcagtcagtagatttttctaaatactgg 

40 gaaccgaataaatacgaagacagttttagtttaacgtcactcatacaaaatttatt tgat 
tttgattctgatataacaga ttacgaacagccacaaaagacaagccattcttctaatgac 
gaaaaagatcaagtagaccaagcagatcaggcaaaacaaccatcacaacatcaagaacaa 
tcacagtcgtctgctaaacaagatcaagaatcatcaaacgatgaaaaagaaaagacaact 
aaccatcaagccgattctgacgtcagtgatttacttggagaaatggataaagaagatcaa 

45 gaaggcgaaaacgtagatacaaacaaaaatcaatcttcttctgagcaacaacaaactcaa 
gcgaatgatgatagctcagaacgtaacaagaaatattctagtattacagattcagcatta 
gactctatattagatgaatatagtcaggacgctaagaaaacagaaaaagattacaataag 
agcaagaatacaagtcacactaaaacatctcaaagtgataatgccgacaagaatccacaa 
ttaccaacagatgatgaattaaaacatcaatcaaaacctgcacaatcatttgaggatgac 

50 attaaacgctcaaatacacgttcaacaagtcttttccaacaactacctgaattagacaat 
ggtgacttatcttctgattcatttaatgttgttgacagtcaagacacacgtgatttcatt 
caatcaattgctaaagatgcgcatcagattggaaaagaccaagatatotatgcatcagtt 
atgattgctcaagctattttagaatctgactctggaaaaagttcacttgcacaatcacca 
aa t ca t aa ct t gt t t ggaa t caa aggtgact a caaaggacaatctgtaacttttaa tact 

55 ttagaagctgatagcagtaatcatatgttcagtatccaagcaggtttccgtaaataccca 
agtactaaacaatctcttgaagattatgcagatttaatcaaacatggtatcgatggtaat 
ccgtcaatttataaaccaacttggaagagtgaagctctatcatataaagatgctacttca 
catctgtcacgctcatacgccacagatcctaat tattctaaaaaat taaatagtattatt 
aaacattatcatttaacatcttttgacaaagaaaaaatgcctaacatgaagaaatacaac 
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aaatcaataggtacggatgtctctggtaatgacttcaaaccatttactgaaacttccggt 
acatcaccttacccacatggccaatgtacttggtatgtgtaccaccgtatgaatcaattt 
gatgcatccatttctggtgacttaggtgatgctcataattggaataatcgtgctgaaagt 
gaaggctatacggtaacgcacacacctaaaaatcatactgcagttgtgtttgaagctgga 
5 caattaggtgctgatacacagtatggtcatgttgctttcgttgaaaaagttaatgacgac 
ggttcaattgttatttctgaatcaaatgttaaaggattaggtgtcatttcattcagaact 
attgatgcagaagatgctcaagatttagattacattaaaggtaaatag 

Sequence 14 00 

10 MKKNKFLVYLLSTALITPTFATQTAFAEDSSNKNTNSDKMEQHQSQKETSKQSEKDEFNN 
DDSKHDSDDKKSTSDSKDKDSNKPLSADSTHRNYKMKDDNLVDQLYDNFKSQSVDFSKYW 
EPNKYEDSFSLTSLIQNLFDFDSDITDYEQPQKTSHSSNDEKDQVDQADQAKQPSQHQEQ 
SQSSAKQDQESSNDEKEKTTNHQADSDVSDLLGEMDKEDQEGENVDTNKNQSSSEQQQTQ 
ANDDSSERNKKYSSITDSALDSILDEYSQDAKKTEKDYNKSKNTSHTKTSQSDNADKNPQ 

15 LPTDDELKHQSKPAQSFEDDIKRSNTRSTSLFQQLPELDNGDLSSDSFNVVDSQDTRDFI 
QSIAKDAHQIGKDQDIYASVMIAQAILESDSGKSSLAQSPNHNLFGIKGDYKGQSVTFNT 
LEADSSNHMFSIQAGFRKYPSTKQSLEDYADLIKHGIDGNPSIYKPTWKSEALSYKDAT5 
HLSRSYATDPNYSKKLNSIIKHYHLTSFDKEKMPNMKKYNKSIGTDVSGNDFKPFTETSG 
TSPYPHGQCTWYVYHRMNQFDASISGDLGDAHNWNNRAESEGYTVTHTPKNHTAVVFEAG 

20 QLGADTQYGHVAFVEKVNDDGS I VISESNVKGLGVISFRTI DAEDAQDLDYI KGK* 

Sequence 1401 

Contig_0598_pos_7 371_6229, 

is similar to (with p-value 3.0e-50) 

25 >sp:sp| P4 9022 | PIP_LACLA PHAGE INFECTION PROTEIN. >gp:gp|L14 6 
79|LACPIP_1 Lactococcus lactis pip and gerC2 genes, complete 

cds's, and rrg gene, 5' end of cds . NID: g308860. 
atgaaaaacgcactaaaactttttatcacggatttaaaaagagttgctaaaacaccaggt 
gtatgggtcatcttagctggtttagcaattcttccttcattctatgcatggtttaacctc 

30 tgggctatgtgggatccgtatggtcatacaggacatatcaaagttgccgtagtgaatgaa 
gaccaaggtgaaaaagttcgtggtaagaatattaatgtaggaaataaaatggtcaaaact 
ttaaaaaagaatgatagttttgactggcaatttgtgagtagagaaaaagccgaccatgaa 
attaagatgggaaaatattatgcaggtatttatataccgaagaaattcacacatgaaatc 
actggtactttaagaaaacatcctcaaaaggcggatatagattttaaagtaaatcagaag 

35 attaatgctgtagcagctaagttaaccgatacgggatcgtcgtttgtgattgataaagca 
aataaacaatttaacaaaaccgtagcaaccgctttactttctgaagctaataaagtcgga 
ctatcaattgaagataatgtacctacaatcaataaaattaagagtgctgtatatcaagct 
aataattcattgcctaaaattaatcaatttgcagacaagattattgaactaaataaacat 
caagacgatttggatgcttatgctaatcaatttagaagtttaggaaagtataaagggaat 

40 gtattagacgctcaagaaaaacttaatgctgttaattcgtctattccggcgcttaatgaa 
agggctaaattgatacttgcacttgatagctacatgcctaatattgaaagaattttaaat 
gttgctgctaatgatgttccagcacaatttcctagaattaataggggtgtcgatattgca 
agtgaaggtattgatgcagcgagtggtcagttaaatgatgcaaaaggttatttgactcaa 
gctaaagcgagagtgggagactatcaagaagcagctggccgcgctcaagatgtgaacaac 

45 caagcaaatcaaaatctaagaaatcaaacatcaactacaccccaaagcgctataaaatca 
tcgcattcggaagggaagagtcattcaagcattaaaacagtacctgtgagtcaatcttgc 
taa 

Sequence 1402 

50 MKNALKLFITDLKRVAKTPGVWVILAGLAILPSFYAWFNLWAMWDPYGHTGHIKVAVVNE 
DQGEKVRGKNINVGNKMVKTLKKNDSFDWQFVSREKADHEIKMGKYYAGI YIPKKFTHEI 
TGTLRKHPQKADIDFKVNQKINAVAAKLTDTGSSFVIDKANKQFNKTVATALLSEANKVG 
LSIEDNVPTJNKIKSAVYQANNSLPKINQFADKI IELNKHQDDLDAYANQFRSLGKYKGN 
VLDAQEKLNAVNSSIPALNERAKLILALDSYMPNIERILNVAANDVPAQFPRINRGVDIA 

55 SEGIDAASGQLNDAKGYLTQAKARVGDYQEAAGRAQDVNNQANQNLRNQTSTTPQSAIKS 
SHSEGKSHSSIKTVPVSQSC* 

Sequence 1403 

Contig_0598_pos_2393_17 67, 
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is similar to (with p-value 4.0e-27) 

>gp:gp| U51115 I BSU51115_13 Bacillus subtilis CotA (cotA), Gab 
P (gabP) r YeaB (yeaB) , YeaC (yeaC), YebA (yebA), GMP synthet 
ase (guaA) genes, complete cds, and AIR carboxylase I (purE) 
5 gene, partial cds. NID: g2239287. >gp : gp | Z99107 | BSUB0004_88 
Bacillus subtilis complete genome (section 4 of 21): v from 6 
00701 to 813890. NID: g2632866. 

atgtcagcaattgctcaaaatccatggttaatggtcttagcaatttttataattaatgta 
tgttatgtcacatttttaacaatgagaacgatattgactttgaaaggttatcgttatgtt 

10 gcagcagtagttagttttatggaagtcttagtttatgttgttggtttagggttagtaatg 
tctagcctagaccaaatccaaaatatttttgcttatgcattaggattctcagtcggtatt 
atagtaggaatgaaaatcgaggaaaaacttgcgttaggttatacagttgtcaatgttact 
tcatcggaatatgagttagatttaccaaatgaattaagaaatttagggtatggtgtaacc 
cactacgaagcatttggtagagatggaagtcgaatggtaatgcaaatattaacaccaaga 

15 aaatatgaattaaagttaatggatactgtcaagaacttagatctcaaggcatttattata 
gcgtatgaaccaagaaatattcacggaggattctgggttaagggtgtacgtaaacgtaaa 
ctgaaagcttatgaaccagaacaactggaagttgtagtagatcacgaagaaatagtaggt 
ggtagctcaaatgagcaaaaagtttag 

20 Sequence 1404 

MSAIAQNPWLMVLAIFIINVCYVTFLTMRTILTLKGYRYVAAVVSFMEVLVYVVGLGLVM 
SSLDQIQNIFAYALGFSVGIIVGMKIEEKLALGYTVVNVTSSEYELDLPNELRNLGYGVT 
H YEAFGRDGSRMVMQI LT PRKYELKLMDTVKNLDLKAFI I AYEPRNI HGGFWVKGVRKRK 
LKAYEPEQLEVVVDHEEIVGGSSNEQKV* 

25 

Sequence 1405 

Contig_0598_pos 0_1243, 

is similar to (with p-value 0.0e+00) 

>sp:sp| P1204 7 1 PUR8_BACSU ADENYLOSUCCINATE LYASE (EC 4.3.2.2) 

30 (ADENYLOSUCCINASE) (ASL) . >pir : pir | C29326 | WZBSDS adenylosuc 
cinate lyase (EC 4.3.2.2) - Bacillus subtilis >gp : gp | J02732 | 
BACPURF_3 B. subtilis pur operon encoding purine biosynthesis 

enzymes, 12 genes. NID: gl43363. >gp : gp | Z99107 | BSUB0004_92 
Bacillus subtilis complete genome (section 4 of 21) : from 60 

35 0701 to 813890. NID: g2632866. 

atgtacagtttttgtaggagctataatacatacgataggagtttacaaagaatgatagaa 
cgttattcaagagatgaaatgtctagtatttggacggatcaaaatcgctatgaagcatgg 
ttagaagtagagattctagcgtgtgaagcatggagtgagttaggttatattcctaaagaa 
gatgtaaaaaaaattcgtgaaaatgccaaggtaaatgttgaacgcgctaaagaaattgaa 

40 caagaaacacgacacgatgtagttgcatttacacgccaagtttctgaaactttaggtgat 
gaacgtaagtgggtacattatggattaacttccactgatgtagttgacactgccttaagt 
tatgtaattaaacaggcaaacgaaatccttgaaaaagatttagaaagatttatagatgta 
ttagctgccaaagctaaaaaatatcaatatacattgatgatggggcgtacacatggagtt 
catgctgagcctacaacttttggcgtaaaaatggctctttggtatactgaaatgaaacgt 

45 aatct taagcgt tt taaagaagtgcgtaaagaaatagaagtagggaaaatgagtggagca 
gtaggaacatttgccaatattccacccgaaattgaggcttacgtatgtgaacatttagga 
atagacactgcagctgtttctactcaaacattacaaagagatcgtcatgcttattacatt 
gctacattagcactaattgctacatcaatggaaaaatttgcagtagaaattcgtaattta 
caaaaaactgaaactagagaagtcgaagaggcatttgcaaaaggacaaaaaggatcatca 

50 gcaatgccacataaaagaaatccaatcggttctgaaaatattacaggtatatctcgtgtg 
attcgaggatatattactactgcatacgagaatatacctttatggcatgaaagagatatc 
tctcattcttctgcagaacgtatcatgttaccagatgttacaattgctctagattatgca 
cttaatcgatttacgaatattgt tgat cgactaacagt ttatgaagataatatgagaaat 
aatattgataaaacgtatggtttaatcttctcacaacgcgtactactagcattaattaat 

55 aaaggcatggtacgtgaagaagcttatgataaagttcaacctaaagcaatggaatcttgg 
gaaacaaaaacaccatttagagagctaattgaacaagattcat 

Sequence 1406 

MYSFCRSYNTYDRSLQRMIERYSRDEMSSIWTDQNRYEAWLEVEILACEAWSELGYIPKE 
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DVKKIRENAKVNVERAKEIEQETRHDWAETRQVSETLGDERKWVHYGLTSTDVVDTALS 
YVIKQANEILEKDLERFI DVLAAKAKKYQYTLMMGRTHGVHAEPTTFGVKMALWYTEMKR 
NLKRFKEVRKEIEVGKMSGAVGTFANIPPEIEAYVCEHLGIDTAAVSTQTLQRDRHAYYI 
ATLALIATSMEKFAVEIRNLQKTETREVEEAFAKGQKGSSAMPHKRNPIGSENITGISRV 
5 IRGYITTAYENIPLWHERDISHSSAERIMLPDVTIALDYALNRFTNIVDRLTVYEDNMRN 
NIDKTYGLIFSQRVLLALINKGMVREEAYDKVQPKAMESWETKTPFRELIEQDSX 

Sequence 1407 

Contig_0599_pos_1311_1619, 

10 putative peptide of unknown function 

atgatggccaacacctttaatataaccaatacattttccatacgagcggcttcgttcatt 
ccgcgtgataatagtaatgcagttaaaataatcactacagcagcaatgatatcaatgaca 
ccaccgttacttccaaatggattagataatgatttaggtaaagaaatgcccaatggtgca 
ataagacctcttaagttagcagaaaagcctgaagcaacgaaagcaacagcaataaagtat 

15 tctgctaaaagcgcccaaccggcaacccatccgaataattcaccaaaaagaacattaatc 
catgaataa 

Sequence 1408 

MMANTFNITNTFSIRAASFI PRDNSNAVKI ITTAAMISMTPPLLPNGLDNDLGKEMPNGA 
20 IRPLKLAEKPEATKATAIKYSAKSAQPATHPNNSPKRTLIHE* 

Sequence 1409 

Contig_0599_pos_2684_3322, 

is similar to (with p-value 8.0e-40) 

25 >sp:sp|P39592| YWBI_BACSU HYPOTHETICAL TRANSCRIPTIONAL REGULA 
TOR IN EPR-GALK INTERGENIC REGION . >pir : pir | S3967 9 I S3967 9 hy 
pothetical protein - Bacillus subtilis >gp : gp I X7 3124 | BSGENR_ 
25 B. subtilis genomic region (325 to 333). NID: g413923. >gp 
:gpl Z99123 |BSUB0020_126 Bacillus subtilis complete genome (s 

30 ection 20 of 21): from 3798401 to 4010550. NID: g2636240. 

atgaatgacattgtgaacgttcaaaaaggtcatattaaaataggcttatcaccaatgatg 
aatgttcaaatgtttacaaatgcattgaatcagtttcacagactctatcctaatgtgaca 
tatgaagtgattgagggtggtggtaaaattgttgagaacttaacatctaatgatgatgtg 
gatattggtattactacattacctgtagatcacactgaatttcattcaacttctttatat 

35 aatgaagaattattattagtagtaagtaatgaccatcatttagcacat ttaaataaagta 
gacatggcagatttgaaagatgaagagtttgttttatttcatgatgattattatttaaaa 
gatcaaattatagagaactgtaaaaggctaggctattaccctaaaactgttgctaatatt 
tctcaaattagttt tatcgctaatatgattcaacaaggaataggaattagtatcgttcca 
gaaagtttagttaatttaatggggaataacgtaacgtccattcaattagagaatgttgaa 

40 ttatcatggcatcttggcgtgatatggagaaaagatgcttatctcaatcatgtaactcgc 
aaatggattgaatttatttctgagatgaaaccaacatag 

Sequence 1410 

MNDIVNVQKGHIKIGLSPMMNVQMFTNALNQFHRLYPNVTYEVIEGGGKIVENLTSNDDV 
45 DIGITTLPVDHTEFHSTSLYNEELLLVVSNDHHLAHLNKVDMADLKDEEFVLFHDDYYLK 
DQIIENCKRLGYYPKTVANISQISFIANMIQQGIGISIVPESLVNLMGNNVTSIQLENVE 
LSWHLGVIWRKDAYLNHVTRKWIEFISEMKPT* 

Sequence 1411 
50 Contig_0599jpos_4138_5106, 

is similar to (with p-value 8.0e-89) 

>pir:pir IA25805 IA25805 L-lactate dehydrogenase (EC 1.1.1.27) 

- Bacillus subtilis 
atgaaggagttcgttaaaatgaaaaaatttgggaaaaaagttgttttagtaggagacggt 
55 tccgtaggttcaagttatgcatttgctatggtgactcaaggaat tgcagatgaatttgta 
attattgatattgcaaaagataaagtggaagcagacgttaaagatttaaaccatggtgca 
ctttacagttcttcaccagtgactgtaaaagctggagaatatgaagattgtaaagatgca 
gatttagttgttattacagcaggtgcacctcaaaaaccgggtgaaactcgtttacaactt 
gttgagaaaaatactaaaatcatgaaaagtatcgtaactagtgtcatggatagtggcttt 
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gatggtttcttcctaattgctgcaaacccagttgatatcttaacacgttatgttaaagaa 
gttacaggtttaccagctgaacgtgttattggttctggtacagtgcttgatagtgcaaga 
ttcagatatttaataagtaaagaattaggtgttacatcaagtagtgttcacgctagcatt 
ataggtgaacatggtgactctgaacttgcagtttggtctcaagcaaacgttggaggtatt 
5 tcagtgtatgatacattgaaagaagaaactggtagcgatgctaaagcgaatgaaatttat 
attaatacaagagatgctgcttacgatatcattcaagctaaaggatctacgtattatggt 
atagctctagcactattacgtatttctaaagctttactaaataatgaaaatagtattttg 
acagtttctagtcaacttaatggtcaatatggatttaacgatgtttatcttggcttacca 
acacttatcaatcaaaatggtgcagttaaaatttatgaaacaccattaaatgataacgaa 
10 ctacaattactagaaaaatcagtgaaaactttagaagacacttatgattctataaaacat 
ttagtttaa 

Sequence 1412 

MKEFVKMKKFGKKVVLVGDGSVGSS YAFAMVTQGIADEFVI I DI AKDKVEADVKDLNHGA 
15 LYSSSPVTVKAGEYEDCKDADLVVITAGAPQKPGETRLQLVEKNTKIMKSIVTSVMDSGF 
DGFFLIAANPVDILTRYVKEVTGLPAERVIGSGTVLDSARFRYLISKELGVTSSSVHASI 
IGEHGDSELAVWSQANVGGISVYDTLKEETGSDAKANEIYINTRDAAYDI IQAKGSTYYG 
IALALLRISKALLNNENSILTVSSQLNGQYGFNDVYLGLPTLINQNGAVKIYETPLNDNE 
LQLLEKSVKTLEDTYDSIKHLV* 

20 

Sequence 1413 

contig_0599_pos_5i69_6833, 

is similar to {with p-value 0.0e+00) 

>gp:gp| LI 6975 I LACALS_2 Lactococcus lactis alpha-acetolactate 
25 synthase (als) gene, complete cds . NID: g473900. >gp:gp|A23 
961 1 A23961__l L. lactis alpha-acetolactate synthase gene. NID 
: g809617. 

atggcggaaaaacaatattctgcagcacaaatggtaattgatactttaaaaaataatgga 
gttgagtatgtatttggtattccaggtgcgaaaatcgactacttatttaatgcactagag 

30 gatgacgatattgaattagtcgttacgcgtcatgaacaaaacgcagcgatgattgcacaa 
ggtattggtcgtttaacaggaaaaccaggtgtggctattactacaagtggcccaggggta 
agtaacttaactactggtttattaactgcaacttctgaaggtgaccctgtattagctatc 
ggtggtcaagttaaaagaaatgacttattacgtttaacacatcaaagtattgataacgca 
tcattacttagatcctcaactaaatatagtgcagaagtacaagatccagaatcactatca 

35 gaggttattacgaatgcaatgcgtacagccacttcaggtaaaaatggagcgagctttatc 
agtattccacaagatgttatttcatcacctgtcaaagctgatgcaatttcattatgtcaa 
aaaccacatcttggtgtaccttcagaacaagaaattaatgaagtgattgaggctattaaa 
aattctaaattcccagtattattagctggaatgagaagttcaagccaagctgaaacagaa 
gctattcgccgtttagttcaaaaaacaaatttacctgttgttgaaacattccaaggtgcc 

40 ggcgtaattagtcgcgaattagaaaatcacttcttcggtcgtgttggtttatttagaaac 
caagtgggtgacgaattacttagaaaaagtgatttagttatcacaatcggttatgaccct 
attgaatatgaagcaagtaactggaataaagaattaga tact aaaat cat caatgtcgat 
gaagaacatgctgaaatcactaattacatgcaaccagttaaagagttaatcggaaacatt 
gcaggt acaa taga ta t ga tttctgaacatgtaaatgaaccatt tat taatcaaga teat 

45 ttagatgaacttgaaaaattaagaggcgaaatcacagaagcaactggaattaaagcaact 
cacaaagaaggtgtgatgcacccagttgaaatcattgaaacaatgcaaaaagttttaact 
gatgatactactgtaactgtagatgtgggaagccattacatttggatggctcgtaaatac 
agaagttacaatcctagacatttactat ttagtaacggtatgcaaactctaggtgt tgea 
cttccatgggctatttcagctgcacttgtacgtccaaatacacaagttgtttctgtagct 

50 ggagacggtggtttcctattctcaggacaagaattagaaactgcagtacgtaaaaactta 
aatatcattcaattaatttggaatgatggtcgttataacatggttgaattccaagaagaa 
atgaaatataaacgctcttcaggtgtagaatttggaccagttgattatgtaaaatatgca 
gaatcatttggcgctaaaggattacgtgtgactaatcaagaagaattagaggcagcactt 
aaagagggttacgaaactgatggaccagtattaattgatatcccagttaactatgcagat 

55 aatgttaaattatctacaaatatgttaccaaatgctttaaattaa 

Sequence 1414 

MAEKQ YSAAQMVI DTLKNNGVEYVFGI PGAKI DYLFNALEDDDI ELVVTRHEQNAAMI AQ 
GIGRLTGKPGVAITTSGPGVSNLTTGLLTATSEGDPVLAIGGQVKRNDLLRLTHQSIDNA 
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SLLRSSTKYSAEVQDPESLSEVITNAMRTATSGKNGASFISI PQDVISSPVKADAISLCQ 
KPHLGVPSEQEINEVIEAIKNSKFPVLLAGMRSSSQAETEAIRRLVQKTNLPVVETFQGA 
GVISRELENHFFGRVGLFRNQVGDELLRKSDLVITIGYDPIEYEASNWNKELDTKIINVD 
EEHAEITNYMQPVKELIGNIAGTIDMISEHVNEPFINQDHLDELEKLRGEITEATGIKAT 
5 HKEGVMHPVEIIETMQKVLTDDTTVTVDVGSHYIWMARKYRSYNPRHLLFSNGMQTLGVA 
LPWAISAALVRPNTQVVSVAGDGGFLFSGQELETAVRKNLNIIQLIWNDGRYNMVEFQEE 
MKYKRSSGVEFGPVDYVKYAESFGAKGLRVTNQEELEAALKEGYETDGPVLIDIPVNYAD 
N VKLS TNML PNALN * 

10 Sequence 1415 

Contig_0599_pos_6987_764 9, 

is similar to (with p-value 9.0e-54) 

>gp : gp I U92974 | LLU92974_24 Lactococcus lactis unknown gene, p 
artial cds, and HisC (hisC) , unknown, HisG (hisG) , unknown, 

15 HisB (hisB), unknown, HisH (hish) , HisA (hisA) , HisF (hisF) , 
HisIE (hisIE), unknown, unknown, LeuA (leuA) , LeuB (leuB) , 
LeuC (leuC), LeuD (leuD), unknown, IlvD (ilvD) , IivB (ilvB) , 
IlvN, IlvC (ilvC), IlvA (ilvA), AldB (aldB) and aldR (aldR) 
genes, complete cds. NID: g2565137. >gp: gp | S824 99 | S824 991 

20 aldB=alpha-acetolactate decarboxylase [Lactococcus lactis, s 
sp. lactis, NCD02118, Genomic, 840 nt] . NID: gl699351. 
atggcaggtttattagaaggaactgcttcaattaatgacttattagaacatggtgat tta 
ggtattgctacgcttacaggttctgatggagaagtaatttttgttgatggtaaagcttat 
catgcaaatgaacataaagaatttatagaattgacaggcgacgaaatgacaccatatgca 

25 actgttacaaaattcaaagcagactcaagttttaaaacatctaataaaaatcaagaagaa 
gtattcgatgaagttaaaaaacaaatgaaaagtgaaaatatgttctcggcagttaaaatt 
tcaggaacgtttaaaaaaatgcatgtacgtatgatgcctggtcaagaacctccatacaca 
cgtttaattgattcagctcgtagacaacctgaagaaacacgtgaaaatatcaaaggt tea 
atcgtaggtttcttcactccagaattattccatggtattggttctgcaggtttccatatt 

30 cactttgcaaatgatgatcgtgattttggtggtcatattttagactttgaagtggatgat 
gtgactgttgaaatacaaaactttgaaacatttgaacaacacttcccagtagatgctaaa 
tcatttactgatgctgacattgactataaagatatagccgatgaaatcagagaagctgaa 
taa 

35 Sequence 1416 

MAGLLEGTASINDLLEHGDLGIATLTGSDGEVI FVDGKAYHANEHKEFI ELTGDEMTPYA 
TVTKFKADSSFKTSNKNQEEVFDEVKKQMKSENMFSAVKISGTFKKMHVRMMPGQEPPYT 
RLIDSARRQPEETRENIKGSIVGFFTPELFHGIGSAGFHIHFANDDRDFGGHILDFEVDD 
VTVEI QN FET FEQH FPVDAKS FT DADI DYKD I ADEI REAE * 

40 

Sequence 1417 

Contig_0599_pos_9020_9811, 

is similar to (with p-value 1.0e-70) 

>sp:sp| P52996I PANB_BACSU 3-METHYL-2-OXOBUTANOATE HYDROXYMETH 
45 YLTRANSFERASE (EC 2.1.2.11} (KETOPANTOATE HYDROXYMETHYLTRANS 
FERASE) . >gp:gp|L47709|BACYPIA_17 Bacillus subtilis (clone Y 
AC15-6B) ypiABF genes, qcrABC genes, ypj ABCDEFGHI genes, bir 
A gene, panBCD genes, dinG gene, ypmB gene, aspB gene, asnS 
gene, dnaD gene, nth gene and ypoC gene, complete cds's. NID 
50 : gll46223. >gp: gp I Z99115 I BSUB0012_183 Bacillus, subtilis com 
plete genome (section 12 of 21): from 2195541 to 2409220. NI 
D: g2634478. 

atgaaggcatcacagcaaaagatttctatggttacagcttatgattatcctagtgctaag 
caagcacaacaagctgaaattgacatgattttggtaggagattctttaggaatgacagtg 
55 ttaggatatgatagtactgttcaagttacattgaacgatatgattcatcatgg taaggct 
gttaaaagaggtgcttcagatacatttatagttgttgatatgcctatagggactgttggt 
ttaagtgatgaagaagatctaaaaaatgcacttaagctttatcaaaacacgaatgctaac 
gctgtcaaagtagaaggggctcatcttacatcatttattcaaaaagcaactaaaatgggt 
atacctgttgtttctcacttaggtcttacacctcaaagtgtaggtgtaatggggtataaa 
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cttcaaggggatacaaagacagccgctatgcaacttatcaaagatgctaaagctatggaa 
actgctggtgcagtagtactggttttagaagccatacctagtgatttagctcgagaaatt 
agtcagcaactcactattccagttataggtataggggcaggaaaagatactgatgggcaa 
gtgttagtgtatcatgatatgttaaattatggtgttgatcgacacgctaagtttgttaag 
5 caatttgcagacttttcaagtggtattgatggattaaggcaatataatgaagaagttaaa 
gcagggacgtttccttctgaaaatcatacttacaaaaaacgtattatggatgaggtagag 
caacatgactaa 

Sequence 1418 

10 MKASQQKISMVTAYDYPSAKQAQQAEIDMILVGDSLGMTVLGYDSTVQVTLNDMIHHGKA 
VKRGASDTFI WDMPIGTVGLSDEEDLKNALKLYQNTNANAVKVEGAHLTSFIQKATKMG 
IPVVSHLGLTPQSVGVMGYKLQGDTKTAAMQLIKDAKAMETAGAVVLVLEAIPSDLAREI 
SQQLTIPVIGIGAGKDTDGQVLVYHDMLNYGVDRHAKFVKQFADFSSGIDGLRQYNEEVK 
AGTFPSENHTYKKRIMDEVEQHD* 

15 

Sequence 1419 

Contig_0599_pos_8950_804 8, 

is similar to (with p-value 1.0e-17) 

>Sp: sp I Q5064 8 I Y0BS_MYCTU HYPOTHETICAL 26.2 KD PROTEIN CY227. 

20 28C. >gp:gp| Z77724 |MTCY227_6 Mycobacterium tuberculosis H37R 
v complete genome; segment 114/162. NID: g3261620. 
atgaatcgtaaaaagaaaggaaagaataaagttatgacaattgccattataggcccaggt 
gcagtgggtacaacgttagcttttgaattaaaaaaagttctaccagatacggaactcatc 
ggccggcaagataaattaatgacctatttcccagaaaatacttctaatggaagtaatgtt 

25 aaagtgacttcatttaatcatattaatcaaacttttgatgtcattatcatagcagttaaa 
acacatcaattggatgacgtcattaaacaattacctaaaatcactcatgacgattcgctc 
attatcttagcacaaaatggctatggacagcttaataaacttccatatcaacatgtcttt 
caagcagtcgtctatattagcggacaaaaagttaacaacaatgttcaacatttcagagat 
taccaactatatat tcaagatagcacactaact cgtcaattcaagcaaatggt tcatcct 

30 tccaaaatagaggtggttttacaagaaaatattgaaaaaagcatttggtataaattatta 
gtgaatttaggtataaataccatcactgctattggacaacaaccagctaaaattttaaaa 
tctcctcatattgagtcg ttg tg tcgtcgtata ttagt tgatggtcttaaagt tgctaga 
gctgaacaaattgactttgaagatcatatcgttgatgatattttaaatatttataaaggt 
tatccagacgaaatgggaacaagtatgtattacgatgtcattaacaagcatcctcttgaa 

35 gtcgaggccatacagggttatatatataaatgtgcaaaaaaacatcatttagagacaccc 
tatctagatatggcttatacatttttatacgcttatcaccttgaatacacacaaccagat 
tga 

Sequence 1420 

40 MNRKKKGKNKVMTIAIIGPGAVGTTLAFELKKVLPDTELIGRQDKLMTYFPENTSNGSNV 
KVTS FNHINQTFDVI I IAVKTHQLDDVIKQLPKITHDDSLI ILAQNGYGQLNKLPYQHVF 
QAWY I SGQKVNNNVQH FRDYQLYIQDSTLTRQFKQMVH PSKI EWLQENIEKS IKYKLL 
VNLGINTITAIGQQPAKILKSPHIESLCRRILVDGLKVARAEQIDFEDHIVDDILNIYKG 
YPDEMGTSMYYDVINKHPLEVEAIQGYI YKCAKKHHLETPYLDMAYTFLYAYHLEYTQPD 

45 

Sequence 1421 
Contig_0599_pos_1915_7 4 3, 
is similar to (with p-value 4.0e-49) 
50 >gp:gp| AL023702 I SC1C3_2 Streptomyces coelicolor cosmid 1C3. 
NID: g3169026. 

atgttatggaggtttgttatgggaagtttttttaatcggatgactcgaaaagagaatcct 
actatttatcaaagtaaagatgggcatcttaaacgcacattacgtgtacgcgactttctt 
gcactaggtgttggtacaattgtttctacatctatcttcactttaccaggtgttgtcgcg 
55 gctgagcatgccggacctgctgtttcattgtcattcttattagctgctattgtggcaggt 
cttgtagcctttacttatgcagaaatggcatctacaatgccttttgctggttcagcttat 
tcatggattaatgttctttttggtgaattattcggatgggttgccggttgggcgctttta 
gcagaatactttattgcUgttgctttcgttgcttcaggcttttctgctaacttaagaggt 
ct tat tgcaccattgggcatttctttacctaaa teat tatctaatccatttggaagtaac 
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ggtggtgtcattgatatcattgctgctgtagtgattattttaactgcattactattatca 
cgcggaatgaacgaagccgctcgtatggaaaatgtattggttatattaaaggtgttggcc 
atcattttatttgtgattgttgggctaactgcgattaatttcagtaactatatacctttt 
attccagaacataaagttactgaaactggcgactttggaggttggcaaggtatttatgct 
5 ggagtttcaatgatttttttagcttatattggttttgactctattgctgctaattcagct 
gaagcgattaatccacagaagacaatgcctagaggaatcttagggtcactcatagtagca 
attgtattgtttgtggccgtagcacttgttcttgttggcatgttccactactctcaatac 
gctgataatgcagagccagtaggttgggcattacgagaaagtggtcatggtattattgct 
gcaattgttcaagcaatttctgtcatcggtatgttcactgcattaatcggtatgatgctt 
10 gcaggttcacgtctattatattcatttggacgagatggtttactcccttcttggttaagt 
caattgaatcacaaacatttacctaatcgagcacttgtcatacttacaatcattggcgta 
gttatcggatcaatgttccctagcaatcgctaa 

Sequence 1422 

15 MLWRFVMGSFFNRMTRKENPTI YQSKDGHLKRTLRVRDFLALGVGTIVSTSIFTLPGVVA 
AEHAGPAVSLSFLLAAIVAGLVAFTYAEMASTMPFAGSAYSWINVLFGELFGWVAGWALL 
AEYFIAVAFVASGFSANLRGLIAPLGISLPKSLSNPFGSNGGVIDIIAAVVIILTALLLS 
RGMNEAARMENVLVILKVLAIILFVIVGLTAINFSNYIPFIPEHKVTETGDFGGWQGIYA 
GVSMIFLAYIGFDSIAANSAEAINPQKTMPRGILGSLIVAIVLFVAVALVLVGMFHYSQY 

20 ADNAEPVGWALRESGHGIIAAIVQAISVIGMFTALIGMMLAGSRLLYSFGRDGLLPSWLS 
QLNHKHLPNRALVILTI IGVVIGSMFPSNR* 

Sequence 1423 

Cont i g_0 60 3_pos_l 4 90_1 990, 

25 is similar to (with p-value 7.0e-38) 

>gp:gp| AF082668 | AF082668_1 Streptococcus pyogenes CsrR (csrR 
) and CsrS (csrS) genes, complete cds . NID: g3599370. 
atgcttccaaacataaatggtctagaaatttgtagacaaattcgtcaaaaaacaactact 
ccaattatcatcattactgcaaaaagcgagacatatgataaagtagctgggttggactat 

30 ggggcagatgactacattgtaaaaccctttgatatagaagaattgctcgcaagaataaga 
gcggtattgcgcagacagccagataaagatgttttagatatcaatggtattatcattgat 
aaagatgcctttaaagttactgttaatggccatcaattagaattaactaaaacagaatac 
gatttattatatgttttagctgaaaatcgtaaccacgtcatgcagcgtgaacaaattctc 
gatcacgtatgggggtataatagtgaagtagaaacgaatgtcgttgatgtttacattcgt 

35 tatttacgtaataaactcaaaccttttaataaagaaaaatccatagaaacagtacgtggc 
gtagggtatgtgattcgatga 

Sequence 1424 

MLPNINGLEICRQIRQKTTTPIIIITAKSETYDKVAGLDYGADDYIVKPFDIEELLARIR 
40 AVLRRQPDKDVLDINGIIIDKDAFKVTVNGHQLELTKTEYDLLYVLAENRNHVMQREQIL 
DH VWG YNSEVETNVV DVY I R YLRNKLKPFNKEKS I ETVRGVGYVIR* 

Sequence 14 25 

Con t i g_0 6 0 3_po s_2 5 0 3_3 3 57 , 
45 is similar to (with p-value 2.0e-42) 

>gp: gp I U81166 | LLU81 166_1 Lactococcus lactis subsp. cremoris 
MG1363 histidine kinase (llkinA) gene, complete cds. NID: g2 
182834 . 

gtgagttatatcttttcttcgcaaattactaaaccgatagttacaatgtccaataaaatg 
50 aatcaaattagaagagatggttttcaaaataaacttgaattaactacaaattatgaagaa 
acagataatttaattgatacttttaatgaaatgatgtatcaaatagaagaatcttttaat 
cagcaacgtcaatt tgtcgaggatgcttcacacgaattaagaacgccactgcagattatt 
caaggtcatctaaatttaatccaacgttgggggaaaaaagatccagcagttttggaagaa 
tctttgaatatttcaattgaagaagtgaatcgaataacaaaacttgtcgaagaactactt 
55 ttacttaccaaagatagagtcaatcataatgttttggaatgtgaaaatgtagacgtaaat 
agcgagattcaatcacgtgtgaagtcactgcaacacctacatccagattatacttttgaa 
acacatcttgctactaagcctatccaattaaaaattaaccgtcatcagtttgaacaactc 
ttactcatatttattgataatgcaatgaaatacgacactgaacataagcacattaaaatt 
gttactcaactaaaaaataaaatgattatgattgatattactgatcatggtatgggtata 
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ccaaaagctgacttagaatttatctttgatagattttatcgtgtagataaatcacgtgct 
cgtagtcaaggaggcaatggattaggactatcaatagcagaaaaaattgtgcaacttaac 
ggtggtatgattcaagtagaaagtgaactacaaaagtacacgactttcaaaatcagtttt 
ccagtactaaactaa 

5 

Sequence 1426 

VSYIFSSQITKPIVTMSNKMNQIRRDGFQNKLELTTNYEETDNLIDTFNEMMYQIEESFN 
QQRQFVEDASHELRTPLQIIQGHLNLIQRWGKKDPAVLEESLNISIEEVNRITKLVEELL 
LLTKDRVNHNVLECENVDVNSEIQSRVKSLQHLHPDYTFETHLATKPIQLKINRHQFEQL 
10 LLIFIDNAMKYDTEHKHIKIVTQLKNKMIMI DITDHGMGIPKADLEFIFDRFYRVDKSRA 
RSQGGNGLGLSIAEKIVQLNGGMIQVESELQKYTTFKISFPVLN* 

Sequence 1427 
Contig_0604_pos_824_1960, 

15 is similar to (with p-value 0.0e+00) 

>sp:sp| P50840I YPSC_BACSU HYPOTHETICAL 43.5 KD PROTEIN IN COT 
D-KDUD INTERGENIC REGION PRECURSOR. >gp: gp I L47838 I BACPONAYPP 
_15 Bacillus subtilis (clone YAC15-6B) ponA gene, yppBCDEFG 
genes, ypqAE genes, yprAB genes, cotD gene, ypsABC genes, rn 

20 aP gene, yptA gene, ypuA gene, kduDI genes, kdgRKAT genes, y 
pwA gene, complete eels ' s . NID: gll46168. >gp : gp | Z99115 | BSUBO 
012_158 Bacillus subtilis complete genome (section 12 of 21) 
: from 2195541 to 2409220. NID: g2634478. 

atgtttcaattattagcagtatgtccaatgggattagaagcagttgtagctaaagaaata 

25 caagaattaggttacgatacacaagtagaaaacggtcgtatcttttttgaaggtgatgaa 
agtgctattgttagatgtaacttatggttacgtaccgcagaccgaataaaaattgtaatg 
ggtcgattcaatgctactactttcgacgaattatttgagcaaacaaagtcattaccatgg 
gagaccgtaattgacaaagaaggtaatttcccggttcaagggcggagtgtaaaatcaaca 
ttatatagtgtacctgactgccaagcaattactaaaaaggctatagtagaacgacttaag 

30 catgctcatcaagaaaaaggatggctaagtgaaacaggcgccaaatatcctgtagaagta 
gcaatattaaaagataatgtattattgaccattgatactgctggttctggactcaacaag 
cgaggttatcgcattgctcaaggtgaagcacccattaaagaaacattggcagcaagcctc 
attcgtttagcaaattggaatggtaacacacctttaattgatcctttctgtggttcaggc 
accattgctattgaagcatgtcttattgcacaaaatattgcacctggttttaatagggat 

35 tttgtatctgaacaatggaatatgatgccacctaatatt tatgacaaatttcgtgatgaa 
gctgatcaattggctgactatgataaagacatacaagtatatgcttctgatattgatcca 
gaaatgat tgaaattgcaaaacgtaatgcagaagaagttggtctcggagatatcatacag 
ttcaatgttaaagatg tgaatactttgtctattgataccgacaagcctgtagcacttgtt 
ggcaatccaccgtatggtgaaagaattggagatcgtgaagaagttgaagaaatgtatcgt 

40 tatataggtacacttttgaaacaacatcctcatttatcggcctacatattaacaagtaac 
aaagaatttgaatacttagttaatcgaaaagcgactaaaagacgtaagttgttcaacggg 
tatattgaatgtacgtactatcagtattggggtaaaaagcaaagtaataaaaattaa 

Sequence 1428 

45 MFQLLAVCPMGLEAVVAKEIQELGYDTQVENGRIFFEGDESAIVRCNLWLRTADRIKIVM 
GRFNATTFDELFEQTKSLPWETVIDKEGNFPVQGRSVKSTLYSVPDCQAITKKAIVERLK 
HAHQEKGWLSETGAKYPVEVAILKDNVLLTIDTAGSGLNKRGYRIAQGEAPIKETLAASL 
IRLANWNGNTPLIDPFCGSGTIAIEACLIAQNIAPGFNRDFVSEQWNMMPPNI YDKFRDE 
ADQLADYDKDIQVYASDIDPEMIEIAKRNAEEVGLGDIIQFNVKDVNTLSIDTDKPVALV 

50 GNPPYGERIGDREEVEEMYRYIGTLLKQHPHLSAYILTSNKEFEYLVNRKATKRRKLFNG 
YIECTYYQYWGKKQSNKN* 

Sequence 1429 
Contig_0604_pos_2092_2766, 
55 putative peptide of unknown function 

atgtatcatattttaattagaaaggggcttttaaatatgaagactgttttgattgtaggc 
gcaaatggtagagtatctatcgaagcgacaaaaattttcctagagaactcaagatttaat 
gttgatttatttttgagaaatgcgcatcgtatacctgattacgcctctaatagaattaaa 
gtttatgagggagacgctaaaaatattgaggatttagaaagtgctttaaacaatgttgat 
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gttgtttttgcaagtttatcgggatcacttgataaacaagctgaaactatcgtaaaagcc 
atggataacaaaaatgttaagagactgatttttgtagcagctcctggtatttatgatgaa 
ctaccagaaccattcaatcaatggaataaagaacaatttggcgaaaaattgaatcgctac 
cgcaaagcatctgatattatagaaaattcagatttagattacacaataatacgtccaggc 
5 tggcttacagataaaaatgaaaatgtatatgagatcacagcaaagaacgaaacatttaaa 
ggtactgaagtatcacgtaaaagtgtagcatctttagcagtacaaattgccaaaaaccca 
gaactacactctaaagaaaatattggtgtgaataaacctaatacagaaggtaataaacct 
gcttggttcaattag 

10 Sequence 1430 

MYHILIRKGLLNMKTVLIVGANGRVSIEATKIFLENSRFNVDLFLRNAHRI PDYASNRIK 
VYEGDAKNIEDLESALNNVDVVFASLSGSLDKQAETIVKAMDNKNVKRLIFVAAPGIYDE 
LPEPFNQWNKEQFGEKLNRYRKASDIIENSDLDYTIIRPGWLTDKNENVYEITAKNETFK 
GTEVSRKSVASLAVQIAKNPELHSKENIGVNKPNTEGNKPAWFN+ 

15 

Sequence 1431 

Contig_0604_pos_314 3_6580, 

is similar to (with p-value 4.0e-81) 

>sp:sp| P54159j YPBR_BACSU HYPOTHETICAL 137.4 KD PROTEIN IN BC 

20 SA-DEGR INTERGENIC REGION. >gp : gp | L7724 6 | BACYACA_6 Bacillus 
subtilis (YAC10-9 clone) DNA region between the serA and kdg 
loci. NID: gl256615. >gp : gp I Z99115 I BSUB0012_14 4 Bacillus su 
btilis complete genome (section 12 of 21) : from 2195541 to 2 
409220. NID: g2634478. 

25 atggcaataaatgaacaattagatactttatataaattaaaaaaagaattggaaaaatct 
aataacagacctttaattaataccat taatcaagtaataaaaaaagtttatttaaatcaa 
tacaccgcaacatttgtaggacatttctcagctggaaaatcaacgttaattaatttgtta 
ttagaacaagatatactaccaagttctccagtaccaacaacgagcaatacagctatcgtt 
tcagtagccaaggaagacgaaatcattgccaatcttacgcaacaacaatatacaaaatta 

30 aaaacgtacaacgatgttaaacaaatgaatcgacaaaatgtcgatgtagaatcgatagaa 
attaattttccatcaaacaaatttaatctagggtttacatttcaagatacaccgggtgtt 
gactctaacgttgcaacgcatcagtcgagtacagagcaatttatgtatacaagtaacttg 
ttgttttatacagtagactataatcatgtccaatcagcgttgaactttaaatttatgaaa 
cgtattaatgaagttggtataccaattatttttgtgattaatcagattgataaacataat 

35 gaagaagaaattacatttgaaacttt taaatcaagagtcgaaaaatcaatcaaagactgg 
gatatcaaacttcaagatacttattacgtttcaaagtttgatcatccacagaatgaaatt 
gacaaactttcaaattttctagtatttatggatcaacatcgtgaatcaacagaagactat 
gttaatagaaccattcaattcattaccgacgcacaatacatatacattcaaaatgaaatg 
caatctattcttgacacccttcaaattaatgaagaacaattcgaggaagcatatattcaa 

40 tttcaacaaaatcaagaagtcagcgcagaagcacaattgctcaatgactctaatcaatta 
tttaattatttaaaacagaagcgtaaagatatattagataatgcttatatcatgacgtac 
gatatgcgtgaatctttacggaattatttagaaagcatggcaactgattttaaagtgaat 
ggattttttaataaaaggaagaaaaaagaagaagaacaaatcaaacgacttaatgaggcg 
accactcaattgcaagagaaagttaatcaacaagtacgacaaccacttcgtgaagatatg 

45 tcatttttaactagattcataaataaacatgctgtgaatgaaaaaatactaaatcaagaa 
tatgacgtcgttccgtcacttatatcagagctatatcaaactcaaacgagcattagcaac 
acatacgttttaacattttcagatgaagttataaaagctttgaataaaaaaatagaaaat 
gagtcaacaccactatttgaagaagctgtcaatcatgtacaagttaatgaattatcgagt 
gatgaaaatgaagataggtatgaatatgatagatacattgaacttaacacattaaaggat 

50 tcgcttacatcccacaactacaaacattactatatccatttagacgattctttagataaa 
ttaattggaagaacagagactcattttgaacttaaacaagaaaattcaactgcttatcat 
cgtaaacatgagacacaacatcgtaacgagtttgttacatctaatcaagatattaagcgt 
gcattagatatcgttaaagatgtaccattatttgatcgcactaaacaagatatcaccgat 
accattctgagactcgataatcaaataacaaaagttggtgtttttggtacatttagcgca 

55 ggtaaaagcagcttaatcaacgctctactaggtgaaaactatttagttagttctcctaat 
ccaacaacagcggcaacgacggaattatcatatggtaaagagagtcaaatcacattaaaa 
tcgaaagaacaattactagaggaagttaatcatgtactagaattttatgaaatatcgttt 
aacacattagacgactttattgagagtgatttagataagttaaaattgaaactagaaaag 
aaccaacttgcatttattagtgcaattgagaaacattatgaaatgtacacatctatgtta 
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gaacattcacttatacacacagtatcgcttgaagaaattaaaaaatggagtgccgaggat 
gagtatgctactttcgtgaaaactgtacaccttaagctacctttagattggctcaagggt 
aaaatcattattgattcactcggtttacattccaataatcaaagacacacgaatgaaact 
gaacaaattctcacttcttctgaccttattttgtatgtaacttacttcaatcattcattt 
5 acagataacgacaaagcctttatcgaacatatgaaagatatgaatcaattaaatgaaaat 
caagcttttaaaatgataattaatgctgttgacctagctgaagataaacaggatatacaa 
gctgttgaggattatgttgcagatgcactggggcaagttaacatacactctgaaatatat 
agtgtttcaagtcgtcaaagtttaaatgggaataacattggcataaatgaattaagagaa 
agtatacaatactttgcaaaggttgaatccagaacaattttagagcaacaaatgacttat 
10 caattgcaacaaatgaatacgtcctttaaaaacatgattaaagattttcatgatgacaac 
gcaaaattgtcagcgagacaacataaattaaatcactataaaaatcaaacccggttaaat 
caagagttgattgatacaactgcacaacgtacttttaatgaagtagaagaacaagtatat 
catctaaatgaacggttaaaactacaacttttagatgaggttaaatctgtatttaatagt 
cagatgacacaaaataacgacttcaatgaggaaaagaaaatttcaactaaaatatattta 
15 gatcaaat teat caacgct tat tcttagaacaatcacttattacagaaaggattaaaaaa 
tattttaattcacaactagaagaacaaatcataccagtcatgaaaaagttaaatcagatt 
catgttattataaatgcaaaatttaatgtggagccatcaatcgttgatacgccattactt 
caaattgaacttaattcaatgttgcaatcactaccaaaacagttaactaaacgtaaaata 
gttaatccaaagtcacaaaaggatattcaagaacacatagctaatcaaactcttgaatta 
20 ttacaagatgatttgaactcattgcgtcgacaattaaatgattatatccacgagatgact 
ca actt gecgaaca t caatttca a a tgttggagacatccatccaacaacaaattgat gag 
ttactttcattcactatcgatgatacattaatacaacaactagaattgaaaaccacacaa 
ttagataatattttatag 

Sequence 14 32 

MAINEQLDTLYKLKKELEKSNNRPLINTINQVIKKVYLNQYTATFVGHFSAGKSTLTNLL 
LEQDILPSSPVPTTSNTAIVSVAKEDEIIANLTQQQYTKLKTYNDVKQMNRQNVDVESIE 
INFPSNKFNLGFT FQDTPGVDSNVATHQSSTEQFMYTSNLLFYTVDYNHVQSALNFKFMK 
RINEVGIPIIFVINQIDKHNEEEITFETFKSRVEKSIKDWDTKLQDTYYVSKFDHPQNEI 
DKLSNFLVFMDQHRESTEDYVNRTIQFITDAQYI YIQNEMQSILDTLQINEEQFEEAYIQ 
FQQNQEVSAEAQLLNDSNQLFNYLKQKRKDILDNAYIMTYDMRESLRNYLESMATDFKVN 
GFFNKRKKKEEEQIKRLNEATTQLQEKVNQQVRQPLREDMSFLTRFXNKHAVNEKILNQE 
YDVVPSLISELYQTQTSISNTYVLTFSDEVIKALNKKIENESTPLFEEAVNHVQVNELSS 
DENEDRYEYDRYIELNTLKDSLTSHNYKHYYIHLDDSLDKLIGRTETHFELKQENSTAYH 
RKHETQHRNEFVTSNQDIKRALDIVKDVPLFDRTKQDITDTILRLDNQITKVGVFGTFSA 
GKSSLINALLGENYLVSSPNPTTAATTELSYGKESQITLKSKEQLLEEVNHVLEFYEISF 
NTLDDFIESDLDKLKLKLEKNQLAFISAIEKHYEMYTSMLEHSLIHTVSLEEIKKWSAED 
EYATFVKTVHLKLPLDWLKGKIII DSLGLHSNNQRHTNETEQILTSSDLILYVTYFNHSF 
T DN DKA FI EHM KDMNQLN E NQA FKM 1 1 N AV DLAE DKQ D I Q A VE D Y VA DALGQVN I H S E I Y 
SVSSRQSLNGNNIGINELRESIQYFAKVESRTILEQQMTYQLQQMNTSFKNMIKDFHDDN 
AKLSARQHKLNHYKNQTRLNQELIDTTAQRTFNEVEEQVYHLNERLKLQLLDEVKSVFNS 
QMTQNNDFNEEKKISTKIYLDQIHQRLFLEQSLITERIKKYFNSQLEEQIIPVMKKLNQI 
HVIINAKFNVEPSIVDTPLLQIELNSMLQSLPKQLTKRKIVNPKSQKDIQEHIANQTLEL 
LQDDLNSLRRQLNDYIHEMTQLAEHQFQMLETSIQQQI DELLS FTIDDTLIQQLELKTTQ 
LDNIL* 

Sequence 14 33 

Contig_0604_pos_6599_74 77, 
is similar to (with p-value 2.0e-48) 
50 >gp:gp|U78771 |LLU78771_2 Lactococcus lactis DNA polymerase I 
(polA) gene, complete cds . NID: g2281292. 
atgcctcaaagagtattgctcgttgatggaatggctttattatttagacatttttatgca 
acaagcttacacaatcaatttatgtacaattctaaaggaattcctacaaacggcattcaa 
ggttttgtaagacatatttttagcgctatcaaagaaatcgaacccactcacgtagcagtt 
55 tgttgggatatggggcaagaaacattcagaaatgaaatgtatgatggttacaaacaaaat 
cgcccagcaccacctgatgaacttatacctcaatttgattatgttaaagaaatatcgcat 
cagtttggttttgtaaatgttggtaagcgtaattatgaagctgatgatattattggtagt 
ttagcggaaacatattcacaagaacatgaagtttatatcattaccggagaccgagactta 
ctccaatgcattaatcataatgtagaagtttggcttattaaaaaaggttttacaatatat 



30 
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40 
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caacgttacacgcttgatcgttttattgatgaatatggactaaatcctcaacaattaata 
gatgtcaaagcttttatgggtgatactgcagatggctattctggtgtaaaagggataggt 
gaaaaaacagcaattaaattaattcaaaatcatggaactgtcgaaaatgtagtgaacaat 
ttatcatcattaactcccgctcaacagaaaaaaataacaaataatttaaatcatctgcat 
5 ttatcaaaatcactcgcagaaatatataccaaagt tccaattgaaacagacaaacttttt 
aaagagatgacatatgctcatacactaaatgagattttatccatttgtaatgaacatgaa 
ctatacgtttcaagtaaatatattgcaactcacctctaa 

Sequence 14 34 

10 MPQRVLLVDGMALLFRHFYATSLHNQFMYNSKGI PTNGIQGFVRHIFSAIKEIEPTHVAV 
CWDMGQETFRNEMYDGYKQNRPAPPDELI PQFDYVKEISHQFGFVNVGKRNYEADDIIGS 
LAETYSQEHEVYI ITGDRDLLQCINHNVEVWLIKKGFTI YQRYTLDRFIDEYGLNPQQLI 
DVKAFMGDTADGYSGVKGIGEKTAIKLIQNHGTVENVVNNLSSLTPAQQKKITNNLNHLH 
LSKSLAEIYTKVPIETDKLFKEMTYAHTLNEILSICNEHELYVSSKYIATHL* 

15 

Sequence 14 35 

Contig_0604_pos_7 659_0, 

putative peptide of unknown function 

atgaagagcaaaccgaaattaaacggtcggaacatctgctcttttttattgagcaaatgt 

20 atgagttattcattgtcaaaattatcaacattaaaaacgtataattttcaaatcacatca 
aacaacaaagaaaaaacatcaagaataggagtggcaatagctttgaataatcgtgataaa 
ttacaaaaatttagtattcgaaaatacgcaattggaacattttctactgtgattgcaaca 
cttgtgttcatgggtatcaatacaaaccatgcaagtgccgacgagttgaatcaaaatcaa 
aagttaattaaacaattaaatcaaacagatgatgatgattcgaatacgcatagtcaagaa 

25 atcgaaaataacaaacaaaattctagtgggcagactgaatcattacgttcatcaactagt 
caaaatcaagcaaatgcacgactgtcggatcaattcaaagacactaatgaaacatcgcaa 
caattacctacaaatgtttcggatgatagtatcaatcaatcgcatagtgaagcaaatatg 
aataacgaaccattgaaagttgataatagtactatgcaagcacatagtaaaatagtaagc 
gatagcgatgggaatgcttctgaaaataaacatcataaactaacagaaaatgtacttgca 

30 gaaagccgagcaagtaaaaatgacaaagagaaagagaatctacaagagaaagataaatcg 
cagcaagtacatccaccattagataaaaatgcattacaagctttttttgacgcatcatat 
cacaattacagaatgattgatagagatcgtgcggatgcaacagaatatcaaaaagtcaaa 
tctacttttgactacgtcaatgacttactaggtaataatcaaaatattccttcagaacag 
cttgtttcggcatatcaacaattagagaaagcattagaacttgcacgtacgttaccacaa 

35 cgatctactacagaaaaacgtggtagaagaagtacgagaagtgttgttgagaatcgttca 
tcaagaagcgattacttagatgctagaactgaatattatgtttcaaaagacgatgatgat 
tctggtttccctcctggtactttcttccatgcttcaaatagaagatggccttataatt ta 
ccaagat ctaggaaca tcttacgtgcttctgatgtacaaggtaatgctt at at cact a ca 
aaacgacttaaagatggatatcaatgggatattttatttaatagtaatcataaagggcat 

40 gaatatatgtactattggtttggacttccaagtgatcaaacaccaactggtccagtaact 
ttcactattatcaaccgtgatggttcaagtacatctactggtggcgttggatttggatca 
ggtgcaccactacctcaattttggagatcagcaggtgctattaattctagcgtagcgaat 
gattttaaacatggctccgctacaaattatgcattttatgatggtgttaataatttttct 
gactttgctagagggggagaattatacttcgacagagaaggcgctacacaaactaataaa 

45 tattatggcgatgaaaacttcgcattgctaaatagtgagaaaccagatcaaataagagga 
ttagatacaatatatagttttaaaggtagtggtgatgtaagttatcgtatttcatttaaa 
actcaaggagctccaactgcaagattgtattatgctgctggcgcgcgttctggtgaatat 
aaacaagcaacgaactataaccaactctatgtcgaaccttataagaattatcgaaatcga 
gtacagtcaaatgtccaagttaaaaatcgtacacttcatttaaaaagaacaatcagacaa 

50 ttcgatcctacattacagagaactactgatgttcctattttggatagtgacggttccgga 
agtattgatt 

Sequence 1436 

MKSKPKLNGRNICSFLLSKCMSYSLSKLSTLKTYNFQITSNNKEKTSRIGVAIALNNRDK 
55 LQKFSIRKYAIGTFSTVIATLVFMGINTNHASADELNQNQKLIKQLNQTDDDDSNTHSQE 
I ENNKQNSSGQTESLRSSTSQNQANARLS DQFKDTNETSQQLPTNVSDDS I NQSHSEANM 
NNEPLKVDNSTMQAHSKIVSDSDGNASENKHHKLTENVLAESRASKNDKEKENLQEKDKS 
QQVHPPLDKNALQAFFDASYHNYRMIDRDRADATEYQKVKSTFDYVNDLLGNNQNIPSEQ 
LVSAYQQLEKALELARTLPQRSTTEKRGRRSTRSVVENRSSRSDYLDARTEYYVSKDDDD 
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SGFPPGTFFHASNRRWPYNLPRSRNILRASDVQGNAYITTKRLKDGYQWDILFNSNHKGH 
EYMYYWFGLPSDQTPTGPVTFTI INRDGSSTSTGGVGFGSGAPLPQFWRSAGAINSSVAN 
DFKHGSATNYAFYDGVNNFSDFARGGELYFDREGATQTNKYYGDENFALLNSEKPDQIRG 
LDTIYSFKGSGDVSYRISFKTQGAPTARLYYAAGARSGEYKQATNYNQLYVEPYKNYRNR 
5 VQS N VQV KN RTLH LK RT I RQ F D PTLQRT T D V P I L DS DG S G S I DX 

Sequence 1437 

Cont ig_0 6 0 4_pos_l 8161418, 

is similar to (with p-value 4.0e-34) 

10 >sp:sp|P50840|YPSC_BACSU HYPOTHETICAL 4 3.5 KD PROTEIN IN COT 
D-KDUD INTERGENIC REGION PRECURSOR . >gp : gp | L4 7838 | BACPONAYPP 
_15 Bacillus subtilis (clone YAC15-6B) ponA gene, yppBCDEFG 
genes, ypqAE genes, yprAB genes, cotD gene, ypsABC genes, rn 
aP gene, yptA gene, ypuA gene, kduDI genes, kdgRKAT genes, y 

15 pwA gene, complete cds's. NID: gll46168. >gp : gp | Z99115 I BSUBO 
012_158 Bacillus subtilis complete genome (section 12 of 21) 
: from 2195541 to 2409220. NID: g2634478. 

atgaggatgttgtttcaaaagtgtacctatataacgatacatttcttcaacttcttcacg 
atctccaattctttcaccatacggtggattgccaacaagtgctacaggcttgtcggtatc 
20 aatagacaaagtattcacatctttaacattgaactgtatgatatctccgagaccaacttc 
ttctgcattacgttttgcaatttcaatcatttctggatcaatatcagaagcatatacttg 
ta tg t ctt ta t ca t ag t cagccaat tgatcagct teat cacgaaatttgt cat aaat at t 
aggtggcatcatattccattgttcagatacaaaatccctattaaaaccaggtgcaatatt 
ttgtgcaataagacatgcttcaatagcaatggtgcctga 

25 

Sequence 1438 

MRMLFQKCTYITIHFFNFFTISNSFTIRWIANKCYRLVGINRQSIHI FNIELYDISETNF 
FCITFCNFNHFWINIRSI YLYVFI IVSQLISFITKFVINIRWHHIPLFRYKIPIKTRCNI 
LCNKTCFNSNGA* 

30 

Sequence 1439 

Contig_0605_pos_8 1 6_1 4 57 , 

is similar to (with p-value 4.0e-52) 

>gp:gpl Z99122 |BSUB0019_48 Bacillus subtilis complete genome 
35 (section 19 of 21): from 3597091 to 3809700. NID: g2636029. 
>gp:gp|U56901 |BSU56901_2 Bacillus subtilis putative transcri 
ptional regulator (yvhJ) , Ycr59c/YigZ homolog (yvhK) , histid 
ine kinase (degS) , transcriptional regulator of degradation e 
nzyme (degU) , (degV) , (comFA), (comFB) , (comFC) , flagellar p 
40 rotein (yviB) , negative regulator of flagellin (flgM), flage 
liar protein (yviC), f lagellar-hook associated protein 1 (fl 
gK] , f lagellar-hook associated protein 3 (flgL), (yviE), tra 
nsmembrane protein (yviF) , (csrA) , flagellin (hag), flagella 
r protein (yviH), flagellar hook-associated protein 2 (fliD) 
45 , flagellar protein (fliS) , flagellar protein (fliT), sigma- 
54 modulator homolog (yvil), and (secA) genes, complete cds . 
NID: gl762326. 

atggataaatccataattactattaaacaagcacattcaattgaaaatgtgataagtaaa 
tcacgctttatagcatatattaagcctgtttcgactgaaaatgaagcaaaagcttttata 

50 gatgaaattaaaacaaaacataaagatgcaactcataattgttcagcctatactgtcgga 
ccagagatgaatattcaaaaggcaaacgacgatggcgaaccaagtggaacagctggcatc 
ccaatgcttgaaatactgaaaaaacaagagatacacaatgtttgtgtcgtcgtgacacgc 
tacttcggtggtatcaagttaggtgcaggcggtcttattagagcatatagcggcgccgtg 
cgtgatgtgatatatgatataggtagagtcgaactaagagaagctattccagtaaccgtt 

55 acgttagattatgatcagacaggtaaatttgaatatgaacttgcctctactacattctta 
ttaagagaacaat tttataccgataaagtaagttatcaaattgacgtagtaaaaaatgaa 
tatgatgcttttatagactttttaaatcgaattacttctggaaattatgatttgaaacaa 
gaagaccttaaactattaccttttgatattgaaaccaattaa 
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Sequence 14 40 

MDKSIITIKQAHSIENVISKSRFIAYIKPVSTENEAKAFIDEIKTKHKDATHNCSAYTVG 
PEMNIQKANDDGEPSGTAGIPMLEILKKQEIHNVCVVVTRYFGGIKLGAGGLIRAYSGAV 
RDVIYDIGRVELREAIPVTVTLDYDQTGKFEYELASTTFLLREQFYTDKVSYQIDVVKNE 
5 YDAFIDFLNRITSGNYDLKQEDLKLLPFDIETN* 

Sequence 14 41 

Contig_0605_pos_3202_3879, 

putative peptide of unknown function 

10 gtgggcattttagtatcggggtcagggatagcgagtgtacaaacaaatataactcacgca 
aaagaaagtcacgattcaactcctcaaaatattaaattagtgggaacgtatgatacttct 
caagttgattccaaaacgatgaaacaatttaaagaaatagaaaaagaagataataatt tc 
cacataactaaacatggaaataaagtcgttgtagaagacaaattacctaatccagagaat 
aaaacttcaagttattcagctgatggtagtgctgaaaataatacaaaagtaattaatttc 

15 tctgattttgttggtaatatggatgggaaagatgatggaaaaatatcggatgggataacc 
ttttatagtggtaaatcatataacggacaacacgatggtcaaaaagtaaaaaaagggact 
catgtacattgtaatagatttaacggaacaaaatctgatcatagatactggtcaaaaaaa 
catcctagagcttatgtagatttttataaaagtgattgctggtatcacgccaaagcttat 
aaatgttcttccttgggaaaaatgactaaatgcgatggtt tgaatagtattta tagaaaa 

20 ggtgtcaaagattgctcatcatggaaaggtaaacccaaacataaaaactggcctaaaaca 
gcatggtatagaaattaa 

Sequence 14 42 

VGILVSGSGIASVQTNITHAKESHDSTPQNIKLVGTYDTSQVDSKTMKQFKEIEKEDNNF 
25 HITKHGNKVVVEDKLPNPENKTSSYSADGSAENNTKVINFSDFVGNMDGKDDGKISDGIT 
FYSGKSYNGQHDGQKVKKGTHVHCNRFNGTKSDHRYWSKKHPRAYVDFYKSDCWYHAKAY 
KCSSLGKMTKCDGLNSI YRKGVKDCSSWKGKPKHKNWPKTAWYRN* 

Sequence 1443 
30 Con t i g_0 6 0 5_po s_3 8 8 3_4 2 4 5 , 

putative peptide of unknown function 

atgaataaaatcttaaaaatattaataacttctattattgttatcattattaccttaaca 
gtttggacttttagtgtgattacttatcagaaacacaagagtgagaaaatcatcaatcac 
gttatagaacgtaagggttgggataaaaaaataaaaaatgaaaaaatgagttttaatatt 
35 ataatgggatatgctgaaaaagatattgtttttaaagatcaaccatatagtgagtatgag 
tataacgtgacaccagcaccatggacagatgataaagaatataaggtgtggggggaaaca 
gatttacaaaagaaagactcctattataaatatcttttagaatcagaaccttacagaaaa 
taa 

40 Sequence 1444 

MNKILKILITSTIVII ITLTVWTFSVITYQKHKSEKIINHVIERKGWDKKIKNEKMSFNI 

IMGYAEKDIVFKDQPYSEYEYNVTPAPWTDDKEYKVWGETDLQKKDSYYKYLLESEPYRK 
+ 

45 Sequence 14 45 

Cont ig_0 605_pos_4 37 5__4 7 4 0 , 

putative peptide of unknown function 

atgaaatggttaaaaatcattatattaattttatctttagttccaatagaatttatcgga 
ttatttactgattatcaaacaggtttattaataggttacatcccatttatagtagttgct 
50 atattaatgagtatatccatttttaaatttggctttaaaaataatataagtattgtaata 
ggtagatgcattggaatatttctttcttggggatgtgttcatttatttatgaacatttac 
gattcatcagattattttaaacctcttaccacagatattttcgcattatttttaggagct 
attcatttcatagttattatgcttatatatttagttatctatggtttttctcatcgcaat 
aattaa 

55 

Sequence 1446 

MKWLKIIILILSLVPIEFIGLFTDYQTGLLIGYIPFIVVAILMSISIFKFGFKNNISIVI 
GRCIGIFLSWGCVHLFMNI YDSSDYFKPLTTDIFALFLGAIHFIVIMLI YLVI YGFSHRN 
N* 
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Sequence 1447 

Contig_0605_pos_5855_5520, 
is similar to (with p-value 3.0e-38) 
5 >gp:gpl AF051916 I AF051916_2 Staphylococcus aureus plasmid pJE 
1 remnant of replication protein Rep (rep) , trimethoprim res 
istance protein DfrA (df rA) , thymidylate synthetase ThyE (th 
yE) , and putative transposase Tnp (tnp) genes, complete cds; 
and unknown gene. NID: g3676404. 

10 atgaatgatataactaaacgtttattaaaaccaataattaatgagctttcttcaattttt 
aataaccttcatattaataagatcaaagctaaaaaaggacgtaaaattgaatggttagag 
tttacctttgacgctgagaaacgcattcacaacaagcgacaaccacaaatgactaatata 
ggtaagtcgcgccaatataccaatcgtgaaaaaacacctaaatggttagacgaaaagata 
tataaacaatctcaagagatacataatgaagatgcaaaattaaaacaagatcgagaggca 

15 tttcaacgtcaattagaagaaaaatgggaggaataa 

Sequence 14 4 8 

MNDITKRLLKPIINELSSIFNNLHINKIKAKKGRKIEWLEFTFDAEKRIHNKRQPQMTNI 
GKSRQYTNREKTPKWLDEKI YKQSQEIHNEDAKLKQDREAFQRQLEEKWEE* 

20 

Sequence 14 4 9 

Contig_0605_pos_2569_1520, 

is similar to (with p-value 0.0e+00) 

>pir:pir I A55856I A55856 11m protein - Staphylococcus aureus 

25 atgatagtcagtttaataatt acacccattattattgtaatatcaaaaaaattagattta 
gtagatcgtcctaatttcagaaaagtacatacgaaacctatctcagtgatgggaggaacg 
gtcattttattttctttcttaatagggatttggctcggacaccctattgaacgtgaggtt 
aaaccgcttatattaggtgcaattacaatgtatatggttggattgattgatgatatttac 
gatctaagaccttatt taaagttagcaggtcaaattgttgcagct ttaat tgt tacgttt 

30 tatggaattacaatagactttatttcattgccaattggtccaacgattcattttggcata 
ttcagcattcctattacagtaatatggattgtagcaattaccaatgctattaatcttatc 
gacggacttgatggacttgcctcaggcgtctcagcattggcattaatgactattggattc 
atcgctattttacaagcgaacatatttattatcatgatttgctgtgtacttttagggtct 
ttacttggtttcttattctataactttcacccagcgaaaattttcctaggtgatagtggt 

35 gcattaatgataggatttattatcggtttcttatccttactcggctttaagaatatcaca 
tttattgcattattctttcctatagttatattagcggtgccatttattgatacattattt 
gcaatgattcgtcgaatgaaaaaagggcaacatataatgcaagcggacaagtcacattta 
ca t cat aaat t act t gctt t a ggat at acgcatagacaaaccgtt t tact tatt tat tea 
atagcgattatgtttagtttatctagtgttatcctctatttatcccaaccgttgggtgca 

40 cttatgatgttcattctcattgtctttacgattgagttgatcgttgaatttactggatta 
atagatgataattatcgaccaatattaaatttaattacaaaaaaaggaaatggtaagcaa 
catcattatgatgagcatcaccgttcataa 

Sequence 1450 

45 MIVSLIITPI IIVISKKLDLVDRPNFRKVHTKPISVMGGTVILFSFLIGIWLGHPIEREV 
KPLILGAITMYMVGLIDDI YDLRPYLKLAGQIVAALIVTFYGITI DFISLPIGPTIHFGI 
FSIPITVIWIVAITNAINLIDGLDGLASGVSALALMTIGFIAILQANIFI IMICCVLLGS 
LLGFLFYNFHPAKIFLGDSGALMIGFI IGFLSLLGFKNITFIALFFPIVILAVPFIDTLF 
AMIRRMKKGQHIMQADKSHLHHKLLALGYTHRQTVLLIYSIAIMFSLSSVILYLSQPLGA 

50 LMMFILI VFTIELIVEFTGLI DDNYRPILNLITKKGNGKQHHYDEHHRS* 

Sequence 1451 
Contig_0605_pos_0_673, 
is similar to (with p-value 2.0e-43) 
55 >sp:sp| P324 36|DEGV_BACSU DEGV PROTEIN. >pir : pir.l S28596 | D30 19 
1 hypothetical protein U3 - Bacillus subtilis >gp : gp | Z1862 9 | 
BSCOMFG_l B. subtilis comF gene. NID: g39847. >gp: gp I Z99122 I B 
SUB0019_45 Bacillus subtilis complete genome (section 19 of 
21): from 3597091 to 3809700. NID: g2636029. >gp : gp I U56901 I B 
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SU56901_5 Bacillus subtilis putative transcriptional regulat 
or (yvhJ), Ycr59c/YigZ homolog (yvhK), histidine kinase (deg 
S) , transcriptional regulator of degradation enzyme (degU) , ( 
degV) , (comFA), (comFB) , (comFC) , flagellar protein (yviB), 
5 negative regulator of flagellin (flgM), flagellar protein (y 
viC) , f lagellar-hook associated protein 1 (flgK), flagellar- 
hook associated protein 3 (flgL), (yviE) , transmembrane prot 
ein (yviF) , (csrA) , flagellin (hag), flagellar protein (yviH 
), flagellar hook-associated protein 2 (fliD), flagellar pro 

10 tein (fliS), flagellar protein (fliT), sigma-54 modulator ho 
molog (yvil), and (secA) genes, complete cds . NID: gl762326. 
atgaagattgcagttatgaccgattctacaagttatttaccacaacatataatagaacaa 
tataacataccagtcgcttcactaagtgtaactttcgatgatggagtgaatttcactgag 
agtgatgatttttctgtagatgatttttataaaaaaatggcttcatctaaaactatacca 

15 acaacaagccaacctgctattggcgattggattgaaaattttgagagattaagagaacaa 
ggatacactgatgtcatcgtgattaacttatcaagtggtataagcggaagctatccttca 
gcaacacaagctggtgaaatggttgaagatattcaagtacatacgtttgatagccgtctt 
gctgcgatgattgaaggtagctttgcaatttacgctgctcaattggtacaaaagggatat 
aaacctgatgatattattaatgaactaactgaaataagacaacatattggtgcatactta 

20 attgttgatgatttaaaaaatttacaaaaaagtggtcgtatcactggagctcaagcttgg 
gtaggtacattattgaaaatgaaacctgtcttgcgttttgaagaagatggtaaaatacat 
ccacacgaaaaagtacgtactaaaaaacgtgcgctaaaatctttagaaacaaacattttt 
aaagaaatagaGT 

25 Sequence 1452 

MKIAVMTDSTSYLPQHIIEQYNIPVASLSVTFDDGVNFTESDDFSVDDFYKKMASSKTIP 
TTSQPAIGDWIENFERLREQGYTDVIVINLSSGISGSYPSATQAGEMVEDIQVHTFDSRL 
AAMI EGS FA1 YAAQLVQKG YKPDDI INELTEIRQHIGAYL1 VDDLKNLQKSGR I TGAQAW 
VGTLLKMKPVLRFEEDGKIHPHEKVRTKKRALKSLETNIFKEIEX 

30 

Sequence 1453 

Contig_0607_pos_339_7 16, 

is similar to (with p-value 2.0e-41) 

>sp:sp| P18158 |GLPD_BACSU AEROBIC GLYCEROL- 3- PHOSPHATE DEHYDR 

35 OGENASE (EC 1.1.99.5). >pir : pir I C4 58 68 I C4 58 68 glycerol-3-pho 
sphate dehydrogenase (EC 1.1.99.5) - Bacillus subtilis >gp:g 
p j M34 393 | BACGLPKD_3 B. subtilis glycerol kinase (glpK) and gl 
ycerol-3-phosphate dehydrogenase (glpD) genes, complete cds. 
NID: gl42990. >gp: gp | Z99108 | BSUB0005_198 Bacillus subtilis 

40 complete genome (section 5 of 21): from 802821 to 1011250. N 
•ID: g2633055. >gp:gp| Z99109|BSUB0006_5 Bacillus subtilis com 
plete genome (section 6 of 21): from 999501 to 1209940. NID: 
g2633260. >gp: gp | Y14079 I BSY14079_5 Bacillus subtilis chromo 
somal DNA, region 75 degrees: glpPFKD operon and downstream. 

45 NID: g2226133. 

atgtcattatctacattgaaaagggatcatattaaaaagaatttaagagacactgaatac 
gatgttgttatcgtaggtggcggtattacaggtgcaggtattgctttagatgcaagtaat 
cgtgggatgaaggtagctttagtagagatgcaagactttgcacaaggtacaagttcacgc 
tcaactaaacttgtacacggtggtttaagatatttaaaacaactgcaagtaggggtagtt 

50 gcagaaacaggtaaagaacgtgctattgtttatgaaaatggtccacatgtgacaacacca 
gaatggatgcttttacctatgcataaaggtggtacatttggtaaattctcaacttctatt 
ggactagctatgtactga 

Sequence 14 54 

55 MSLSTLKRDHIKKNLRDTEYDWIVGGGITGAGIALDASNRGMKVALVEMQDFAQGTSSR 
STKLVHGGLRYLKQLQVGVVAETGKERAIVYENGPHVTTPEWMLLPMHKGGTFGKFSTSI 
GLAMY* 

Sequence 1455 
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Contig_0607_pos_2046_3350, 

is similar to (with p-value 0.0e+00) 

>sp:sp|P18158|GLPD_BACSU AEROBIC GLYCEROL- 3 -PHOSPHATE DEHYDR 
OGENASE (EC 1.1.99.5). >pir : pir I C4 58 68 I C4 5868 glycerol-3-pho 
5 sphate dehydrogenase (EC 1.1.99.5) - Bacillus subtilis >gp:g 
p|M34 393|BACGLPKD_3 B. subtilis glycerol kinase (glpK) and gl 
ycerol-3-phosphate dehydrogenase (glpD) genes, complete cds . 

NID: gl42990. >gp : gp | Z99108 I BSUB0005_1 98 Bacillus subtilis. 
complete genome (section 5 of 21): from 802821 to 1011250. N 
10 ID: g2633055. >gp : gp I Z99109 | BSUB0006_5 Bacillus subtilis com 
plete genome (section 6 of 21): from 999501 to 1209940. NID: 

g2633260. >gp: gpj Y14079 I BSY14079_5 Bacillus subtilis chromo 
somal DNA, region 75 degrees: glpPFKD operon and downstream. 

NID: g2226133. 

15 atgtacgatcgtctagctggtgtcaaaaaatccgaacgtaaaaaaatgttatctaagcaa 
gaaacgttaaataaagaacctttagttaaacgtgatggattaaaaggcggtggctactat 
gtggaataccgcactgatgatgcgcgtttaactattgaagttatgaaaaaagctgctgaa 
aatggagcagaaatcattaattatacaaaatcagaacacttcacttatgattccaataag 
aaagtaaatggtattgaagtattggatatgattgatggcgaaacgtatgcgattaaagct 

20 aaaaaagttattaatgcttctggtccttgggttgatgaagtgagaagtggcgattatgca 
cgtaacaataagcaattaagattaactaaaggtgtacacgttgttatagatcaatctaaa 
ttcccattaggtcaagcagtttactttgatactgaaaaagacggacgcatgatttttgcg 
attccacgtgaaggaaaagcttatgtaggaacaactgacacgttttatgataatgaaaaa 
gcaacacctttaacaacacaagaagatagagactacttaattaatgcaattaactatatg 

25 ttcccaacagttaatgttaaagatgaagatattgaatcaacatgggctggtattcgtccg 
ctaattcttgaaaaaggtaaagatccttctgaaatctcacgtaaagatgaagtttgggaa 
ggtgaatctggattattaactatagcaggcggtaaattaactggttatcgtcatatggca 
ctagaaattgttgatttattagctaaacgtttaaaacaagaatacggattgaaatttgaa 
tcatgtgccacaaaaaatctaaaaatttccggtggtgacgttggcggaagcaaaaacttt 

30 gaacactttgttgaacaaaaagttgatgcagctaaaggatttggaattgatgaagatgtg 
gcacgtcgcttagcaagtaaatatggttcaaatgttgatcaactatttaatattgctcaa 
acggcaccatatcatgatagtaaattaccattagaaatttatgttgaattagtttatagt 
attcaacaagaaatggtttacaaaccaactgacttcttagtacgtcgttctggcaaatta 
tact ttaatattcaagatgtgttagattataaaaatgctgtgatagatgt tat ggcggat 

35 atgcttaattatagtgaaactcaaaaagaagcttatactgaagaagtagaagttgcgat t 
gatgaggcacgtacaggtaatgatcaacctgcaactaaagcttaa 

Sequence 14 56 

MYDRLAGVKKSERKKMLSKQETLNKEPLVKRDGLKGGGYYVEYRTDDARLTIEVMKKAAE 
40 NGAEIINYTKSEHFTYDSNKKVNGIEVLDMI DGETYAIKAKKVI NASGPWVDEVRSGDYA 
RNNKQLRLTKGVHVVIDQSKFPLGQAVYFDTEKDGRMIFAIPREGKAYVGTTDTFYDNEK 
ATPLTTQEDRDYLINAINYMFPTVNVKDEDIESTWAGIRPLILEKGKDPSEISRKDEVWE 
GESGLLTIAGGKLTGYRHMALEIVDLLAKRLKQEYGLKFESCATKNLKISGGDVGGSKNF 
EHFVEQKVDAAKGFGIDEDVARRLASKYGSNVDQLFNIAQTAPYHDSKLPLEI YVELVYS 
. 45 IQQEMVYKPTDFLVRRSGKLYFNIQDVLDYKNAVIDVMADMLNYSETQKEAYTEEVEVAI 
DEARTGNDQPATKA* 

Sequence 1457 
Contig_0607_pos_3625_4 500, 

50 putative peptide of unknown function 

gtgaaaattgataaagcaaaaaagagtactattggtattgttcatcttttccatggcatg 
gctgaacatatggaccgttatcaagagttagttgaggctttaaatacacaaggttatgac 
gttgttagacataatcaccgtggacatggtaaagaaatagatgagaatgaacgtggtcat 
tttaatagcatgaatcaaattgtagatgatgcttatgaaattattgagacattatatctt 

55 gaagagctcaatgtaccctatattatcataggtcattcaatgggctccattattgctaga 
tcatttgttgaaaagtatcctgacattgctcaaggtttaattcttacaggaacaggtatg 
ttccccaagtggaaaggtgtaccaatacgtttagcaatgaagttagttacatttattttt 
gggaaacgacgtcgactcaagtgggtgaatcaattattgaataaaactttcaataaaaaa 
atcactcaacctcgaacagatagtgattggatttctacacgtcaggatgaagttgataaa 
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tttgtggaagatgaattttgtggattcaaagtatctaatcagctcatttatcaaacttta 
aagaccatgatgaagacagtagaacgacaacaactaaaaagaatggacaaagaactacct 
atactatttatttctgggaaagatgatccttttggtgaatatggtaaaggtataaagcat 
ttagctagattatataaaagagcaggtattaaacatataacagtacaactatataaacat 
5 aagcgtcatgaaatattatttgaagaagattatttgaaaacatggcaacacatgtttgaa 
tggatggaaaagcaaattttgaaaaaacaaaagtga 

Sequence 1458 

VKI DKAKKSTIGI VHLFHGMAEHMDRYQELVEALNTQGYDVVRHNHRGHGKEIDENERGH 
10 FNSMNQIVDDAYEIIETLYLEELNVPYII IGHSMGSI I ARSFVEKYPDI AQGLILTGTGM 
FPKWKGV P I RLAMKLVT FI FGKRRRLKWVNQLLNKTFNKKITQPRTDSDWI STRQDEVDK 
FVEDEFCGFKVSNQLIYQTLKTMMKTVERQQLKRMDKELPILFISGKDDPFGEYGKGIKH 
LARLYKRAGIKHITVQLYKHKRHEILFEEDYLKTWQHMFEWMEKQILKKQK* 

15 Sequence 1459 

Contig_0607_pos_4 521_5510, 

is similar to (with p-value 3.0e-43) 

>gp:gp| AL022268J SC4H2_12 St reptomyces coelicolor cosmid 4H2. 
NID: g3036873. 

20 atgacaaagccttttttaatcgttattgtaggtccaactgcttcaggtaaaactgagt.ta 
agtat tgaagttgctaaaaaatttaatggagaaattattagcggagattcaatgcaggtc 
tatcaaggaatggatattggtacagcaaaagttacaactgaagaaatggaaggtatacca 
cattatatgatagatattttgcctccagatgcttccttttctgcatatgaatttaaaaaa 
agggcagaaaaatatattaaagatattactagaagaggcaaggtgcctattatagcagga 

25 ggaacaggactatatatacaatctctcttatacaactatgcttttgaagatgaatccata 
tctgaagataaaatgaaacaagttaaattaaagt taaaagaacttgagcatctaaataat 
aataagctccacgaatat ttagcttcattcgacaaagaatcagccaaggat atacatcct 
aataacagaaaaagagtgttgcgagcaatagaa tattatttgaaaacaaaaaaactttta 
agttctcgcaagaaagtgcaacaatttactgaaaattatgatacattat taatagggatt 

30 gaaatgtcgcgtgaaacattatatttaagaataaataaacgtgttgatattatgttgggc 
cacggattatttaatgaagtgcaacatctcgttgaacaaggttttgaagcgagtcaaagt 
atgcaagccattgggtataaagagcttgtacccgttattaagggaaatataagcatggaa 
aatgctgtagagaaattaaaacagcattctcgacaatatgctaaaagacagttgacttgg 
tttaaaaataaaatgaatgttcattggttaaataaagaaaggatgtcacttcaaatgatg 

35 ttagatgagattacaacccaaataaataaaaggagttctaaccatgattgcaaacgaaaa 
catccaagaccaagcactagagaactttaa 

Sequence 14 60 

MTKPFLIVIVGPTASGKTELSIEVAKKFNGEI ISGDSMQVYQGMDIGTAKVTTEEMEGIP 
40 HYMIDILPPDASFSAYEFKKRAEKYIKDITRRGKVPIIAGGTGLYIQSLLYNYAFEDESI 
SEDKMKQVKLKLKELEHLNNNKLHEYLASFDKESAKDIHPNNRKRVLRAIEYYLKTKKLL 
SSRKKVQQFTENYDTLLIGIEMSRETLYLRINKRVDIMLGHGLFNEVQHLVEQGFEASQS 
MQAIGYKELVPVIKGNISMENAVEKLKQHSRQYAKRQLTWFKNKMNVHWLNKERMSLQMM 
LDEITTQINKRSSNHDCKRKHPRPSTREL* 

45 

Sequence 14 61 

Contig_0 607_pos_6502_7752, 

is similar to (with p-value 3.0e-85) 

>gp:gp|U66480 |BSU66480_2 Bacillus subtilis SpoVK (spoVK) , Yn 
50 bA (ynbA), YnbB (ynbB), GlnR (glnR) , glutamine synthetase (g 
InA), YnaA (ynaA), YnaB (ynaB), YnaC (ynaC) , YnaD (ynaD) , Yn 
aE (ynaE), YnaF (ynaF), YnaG (ynaG), YnaH (ynall) , Ynal (ynal 
), YnaJ (ynaJ), xylan beta-1, 4-xylosidase (xynB) , xylose rep 
ressor (xylR) , xylose isomerase (xylA), xylulose kinase (xyl 
55 B), YncB (yncB) , YncC (yncC) , YncD (yncD) and YncE (yncE) ge 
nes, complete cds. NID: gl750106. >gp:gp | Z99113 I BSUB0010_37 
Bacillus subtilis complete genome (section 10 of 21) : from 1 
781201 to 2014980. NID: g2634090. 

atgtataacgtgacacagcatgcgacttatagaacaaaaaataaacgagaaactgctgta 
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ttaatcggtgtacatgctcaaacggatcgtcaatttaattttgaatctactatggaagag 
ctcgatgctttatcacaaacttgccaacttaatgttaaaggacaaatcactcaaaataga 
gagcaatttgaccataaatattatgttggaaaaggaaaaatcgatgaaataaaatctttc 
atagaattccatgatatagatgttgtcgtaaccaacgatgaattaacgacggcacagtct 
5 aaaacgttaaatgataatttgggcattaaaatcatcgatagaacccaattaattttagag 
atattcgcgttgcgagcgagaagtagagagggaaagctacaagtagaacttgcacaactc 
gattatttgttaccaagactacatggtcatggtaaaagcctatctcgtcttggtggtggc 
ataggaacaagaggcccaggtgaaacaaaattagaaatggatcgtcgccatattagaaca 
cgtatgaatgagattaaacatcaattaaaaacggtcgtggatcatcgggaaagatataga 

10 aataaacgtgaacaaaatcaagtttttcaaatcgctttagttggttatacaaatgcagga 
aaatcgtcatggtttaatgttttagctaatgaggagacctatgaaaaaaatattttgttt 
gcaacattagatcccaaaacacgacaaatacaagtgaatgaaggatttaatttaattatt 
tctgatacggtaggatttattcagaaattaccaacgacattggtggctgcgtttaaatct 
acactagaagaagctaaaggtgcagacgtacttatgcatgtcgtcgatgcaagtcattcg 

15 gaataccgtactcaaattgacactgtaaatcaaattattaatgatttagatatggaccat 
attccacaagtagttatttttaataaaaaagacttatgtaacgaacagatggatgtacct 
gtatctaaatctgcgcatgtttttgtatctagtcgtgatgaaaatgataaacaaaaggtg 
aaaaatttagtaattcaagaaataaaaaatagtctcagcccatacgaagaaattgtagat 
agtgctgatgcagatagattatattttcttaaacaacacacgcttgttactgaattaata 

20 tttgacgaaacacaagcatcttatcgtatcaaaggatttaaaaaattataa 

Sequence 14 62 

MYNVTQHATYRTKNKRETAVLIGVHAQTDRQFNFESTMEELDALSQTCQLNVKGQITQNR 
EQFDHKYYVGKGKIDEIKSFIEFHDIDVVVTNDELTTAQSKTLNDNLGIKIIDRTQLILE 
25 IFALRARSREGKLQVELAQLDYLLPRLHGHGKSLSRLGGGIGTRGPGETKLEMDRRHIRT 
RMNEIKHQLKTWDHRERYRNKREQNQVFQIALVGYTNAGKSSWFNVLANEETYEKNILF 
ATLDPKTRQIQVNEGFNLIISDTVGFIQKLPTTLVAAFKSTLEEAKGADVLMHVVDASHS 
EYRTQIDTVNQIINDLDMDHI PQVVI FNKKDLCNEQMDVPVSKSAHVFVSSRDENDKQKV 
KNLVIQEIKNSLSPYEEIVDSADADRLYFLKQHTLVTELIFDETQASYRIKGFKKL* 

30 

Sequence 14 63 

Contig_0607_pos_7768_8277, 

is similar to (with p-value 3.0e-34) 

>gp:gp|U66480|BSU66480_3 Bacillus subtilis SpoVK (spoVK) , Yn 
35 bA (ynbA), YnbB (ynbB) , GlnR (glnR) , glutamine synthetase (g 
InA) , YnaA (ynaA) , YnaB (ynaB) , YnaC (ynaC), YnaD (ynaD) , Yn 
aE (ynaE), YnaF (ynaF) , YnaG (ynaG) , YnaH (ynaH), Ynal (ynal 
), YnaJ (ynaJ), xylan beta-1, 4-xylosidase (xynB), xylose rep 
ressor (xylR) , xylose isomerase (xylA) , xylulose kinase (xyl 
40 B), YncB (yncB) , YncC (yncC), YncD (yncD) and YncE (yncE) ge 
nes, complete cds. NID: gl750106. >gp: gp | Z99113 I BSUB0010_38 
Bacillus subtilis complete genome (section 10 of 21) : from 1 
781201 to 2014980. NID: g2634090. 

atgcaagattttagcaatttagttgaagaagttgaaaacacacttattccttactttaga 
45 aaaattgaaaagcgtgcattatttaatcaggaaaaggtcttaaatgcttttcaccatgtt 
aaagctagcgaaagtgatttacaggggtctacgggttatggatatgatgattttgggaga 
gaccatttagaacaaatttatgcgcacacatttaaagcagatgacgcacttgtaagacct 
caaattatttcaggtactcatgctattactttagctttacaaagtacgttaaaaaacaat 
gatgaactactttatattacaggtagtccatatgatacacttctagaagtcattgg Lata 
50 aatggcaatggtgttgaaagtcttaaagaatatggtgttcgctataatgaagtcgaatta 
cgtgacggtcgaattgatattcctaaagtcatcactgcaattaatgacaatacaaaaagt 
tgtagcaattcaacgatcaaaaggatatga 

Sequence 14 64 

55 MQDFSNLVEEVENTLIPYFRKIEKRALFNQEKVLNAFHHVKASESDLQGSTGYGYDDFGR 
DHLEQI YAHTFKADDALVRPQI ISGTHAITLALQSTLKNNDELLYITGSPYDTLLEVIGI 
NGNGVESLKEYGVRYNEVELRDGRIDI PKVITAINDNTKSCSNSTIKRI* 

Sequence 14 65 
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Contig_0608_pos_205_94 8, 

is similar to (with p-value 4.0e-17) 

>sp:sp|P4 2237|GUDT_BACSU PROBABLE GLUCARATE TRANSPORTER. >gp 
:gp|D30808|BACYCB20_4 Bacillus subtilis DNA around 20 degree 
5 s region of chromosome containing yckA-T genes. NID: g709995 
• >gp:gp!Z99105|BSUB0002_77 Bacillus subtilis complete genom 
e (section 2 of 21): from 194651 to 415810. NID: g2632457. 
atggcagttctttgggcaattattgctaaagacttaccagaacaacataaaatggtaaac 
gatgccgagaaaagatttattactcaaaatcgtgatattgtcgcaactgagaaatcttta 

10 ccaccgtggaaacgttttttaagtcatttcagtttctacgcaatcgcattgcaatacttt 
gtagttcagtttgttatcgcgttgttcctaatatggttaccaacatatttaactgaacaa 
tatcatgtgaatttcaaagaaatgactatcagtgcattaccttggttatttatgttcttc 
ttaattttatttgctggagctatttcagacaagattttgaatacaggtcaatcacgtttt 
gttgcacgtggcgtaattgcgattgcgggatttgtggtattctcaatttcaattttctta 

15 gcagtacatacagacaacttatatgtaaccattttctggttatcactttgtttaggtggc 
gtaggtatttctatgggaatgagttgggctgcagccactgacttaggtcgtaatttctct 
ggaactgtgtcaggttggatgaacttatggggaaatgttggcgcacttattagtccttta 
cttgcaggaattgttgtagatcatttaggatggtcaatgacattacaactcttaatcgtt 
ccggctattatagcagcaattctatggttcttcgttaaaccagacaaacctcttattgtt 

20 agtgaagaacacgttaataaataa 

Sequence 1466 

MAVLWAIIAKDLPEQHKMVNDAEKRFITQNRDIVATEKSLPPWKRFLSHFSFYAIALQYF 
VVQFVIALFLIWLPTYLTEQYHVNFKEMTISALPWLFMFFLILFAGAISDKILNTGQSRF 
25 VARGVIAIAGFVVFSISI FLAVHTDNLYVTI FWLSLCLGGVGISMGMSWAAATDLGRNFS 
GTVSGWMNLWGNVGALISPLLAGIVVDHLGWSMTLQLLI VPAIIAAILWFFVKPDKPLIV 
SEEHVNK* 

Sequence 1467 
30 Contig_0608_pos_1553_34 69, 

putative peptide of unknown function 

atgcaacaaatcattaactctttgatacacttcgaccgttcaaaaattgatatagctaaa 
ggaatacgacaaggatttttaat gat acta ccagcatt gat aggtt act tattaggattc 
cctatgtttggtattctaatatcaacaggtacgctagcacatgtctacgtttttagtgga 

35 tcaccacaatctatgttaaaaacagtcatcacctgttcactatcatttactatttgcatg 
attcttggcactttaacagtatctcagcctattttatttggattactattactgattgtt 
gttacaatcccatattatacgtttaatgcactaaaaatcgctggtccatcatcgacattc 
tttttagtaacgttttgcttatctataaacttaccgatagccccagaagaagcgctttta 
cgtggatctgcaattctcattggtggtatattggctaccataacagtaattttaacaatc 

40 atatttgctaaagagaaagcagaagacagagcaat tcatgcggatt ttaaaacattacat 
aacttgctacatcattttgatgagccagaggatttcaaagcatatgctcgaaacgctgtt 
acagaatttagaaattctgaaaaacttttaattacctcaacatcaggtggtaatggaaaa 
ttaagtaaaaggtttcagaaattaattttattacacacatcagcacaagggatatattca 
gaattactagaactcaatgaaaatcatattcgtccattaccaagtgacttagttgaaatg 

45 atggatcatatcattaatagtgttcaacaacctaaacaacaatatcgaccgtggtcaaaa 
gttgttgatgtggcaccagaatttcaaaatttaatggatcatattttgaaaatagatgaa 
atgattcacgcaaacgataatcaaattaaatatgaagcagatattcgcaagcctttatat 
agtaagcgaatatatcaaaatctaactttcgactcaattgtatttagaaatgctttgcaa 
tatacagttattatggcagtagctatatttattgctctagcgtttaacattcaaaaagcg 

50 tattgggtgccattgtcagcgcataccatcatgttaggtaatgtgacaacgattcgtacg 
ttagacaggtcacttgctagaggtattggaacgataatcgggactattgttttgtcggga 
atcttggcatttcatatcgatcctattttcgctattatcattatgggattttctgccatg 
atgacggaagcgtttgtggcatctaactatgcatttgcagtcatttttattacgacacaa 
gtcattatgctcaatggtctagcctctcaaaatttaaatattgagatagcgtacacccga 

55 attattgacgttctgataggtatagctattgcagttattggtatattcatactcgcgcgt 
aaaactgcatcctcaatgttatctgatgctattgccgaattggtacgtaaagaaggtatt 
ttatttcattatttattttcaaaaaataaacaggaaacgaatgaacgtgatagggtagaa 
agtttaaatttaaacgtcaaaatcagtaatgtgacacagatttacaattcagcgaatggt 
gaattgtttagtaataaagaagcggtaaggtattactatccaagcatatttgctctagag 
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gaaataagttttatgctagagcgtgccatgaataataaacatcgacaaacaataaatgat 
gatttaatgggtgaatatttagtcgtatttgaaaatatagctaagcatttccaatttcaa 
gcagatttaaatctcagagacatgcaaccattacctcaatacaattacatccgtgcttcc 
cttatgaatatacaacgtaattgtgttgaacaacgtcaggccatcacaaaagattaa 
5 . " 

Sequence 14 68 

MQQIINSLIHFDRSKIDIAKGIRQGFLMILPALIGYLLGFPMFGILISTGTLAHVYVFSG 
SPQSMLKTVITCSLSFTICMILGTLTVSQPILFGLLLLIVVTIPYYTFNALKIAGPSSTF 
FLVT FCLSINLPIAPEEALLRGSAILIGGILATITVILTIIFAKEKAEDRAIHADFKTLH 

10 NLLHHFDEPEDFKAYARNAVTEFRNSEKLLITSTSGGNGKLSKRFQKLILLHTSAQGIYS 
ELLELNENHIRPLPSDLVEMMDHIINSVQQPKQQYRPWSKWDVAPEFQNLMDHILKIDE 
MIHANDNQIKYEADIRKPLYSKRIYQNLTFDSIVFRNALQYTVIMAVAI FIALAFNIQKA 
YWVPLSAHTIMLGNVTTIRTLDRSLARGIGTIIGTIVLSGILAFHIDPI FAI I IMGFSAM 
MTEAFVASNYAFAVIFITTQVIMLNGLASQNLNIEIAYTRIIDVLIGIAIAVIGIFILAR 

15 KTASSMLSDAIAELVRKEGILFHYLFSKNKQETNERDRVESLNLNVKISNVTQIYNSANG 
ELFSNKEAVRYYYPSIFALEEISFMLERAMNNKHRQTINDDLMGEYLVVFENIAKH FQFQ 
ADLNLRDMQPLPQYN Y I RAS LMN IQRNCVEQRQAI TKD * 

Sequence 1469 
20 Contig_0608_pos_3547_3960, 

putative peptide of unknown function 

atgccttggacgatggaagattatcctcaaagttggaaaaactttgaagaactcgaacgg 
aaaaaagcgattgatat tggtaatgcgatgctcaaagatggctataaagaatctgatgtg 
atacctattgctacaaatcaagctgaaaaatggtatgaacatgcttcaaaagaagaatta 
25 gagacgttaaaaaacaaacatatcacgcaacatcaagaagatgaatcagctaatcctaaa 
cttaacgaagaaaatgttcatgtatattatgaagatcagctatggaaagtaaaatctaaa 
gaggctagacgagcttcagatacatttgacacaaaatctgaagcagttaaccgtgcacaa 
catatcgcagagaataaaggtaccaaagtgattgagcatcgaaaagatgagtga 

30 Sequence 1470 

MPWTMEDYPQSWKNFEELERKKAI DIGNAMLKDGYKESDVI PIATNQAEKWYEHASKEEL 
ETLKNKHITQHQEDESANPKLNEENVHVYYEDQLWfCVKSKEARRASDTFDTKSEAVNRAQ 
HIAENKGTKVIEHRKDE* 

35 Sequence 1471 

Contig_0608_pos_1204 4_11130, 

is similar to (with p-value 7.0e-76) 

>sp:sp|P94 4 63JFMT_BACSU METHIONYL-TRNA FORM YLT RAN SFERASE (EC 
2.1.2.9). >gp:gp| Y10304 |BSPRIADFS_3 B.subtilis priA, def, f 

40 mt, sun genes. NID: gl772497. 

atgggaacacctgatttttcaacgaaaattttagagatgttaattgctgagcatgaagtt 
atcgctgtagtgacacaacctgatagaccagtgggacgtaagaaagtgatgacaccacca 
ccagtaaaaagagtagctacaaagcatcaaataccggtatatcaacctgaaaaacttaaa 
gat tctcaagaattagaa teg t tact ttctttagaatcagatttaatagtaacggctgcg 

45 ttcggtcaactattaccagagtccttactcaatgcacctaaattaggagctattaatgtc 
catgcatcattgctacctaagtatagaggaggagcacctatacatcaggctataattgat 
ggtgaagaagaaactggaatcacgattatgtatatggttaaaaaacttgatgcaggtaat 
atcatctcgcaacaatcaattcgtattgaagaagaagataatgttggcgcaatgcatgat 
aaattaagctttttaggtgccgaattattaaagaagacacttcctagtatcattgataat 

50 accaatgacagtatccctcaagatgatgcacttgcaacatttgcatctaatattcgtcgt 
gaagacgagagagttgattggaatatgagtgcacaagcaattcataaccatattagagga 
ctgtctccatggccagttgcttatacaactatgaatgaaaagaatctcaaattatttagc 
gctttcattgtgaaagggaaaaaaggtaatccaggaacaattattgaaactactaagcat 
ga act ca t ca t agct aceggt t ctgatgatgccatcgcact tact gaga ttcaacctgca 

55 gggaaaaaacgtatgaaagttactgattatttaagtggtgtacaagagtcgttagttggg 
aaagttctattatga 

Sequence 1472 

MGTPDFSTKILEMLIAEHEVIAVVTQPDRPVGRKKVMTPPPVKRVATKHQI PVYQPEKLK 
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DSQELESLLSLESDLIVTAAFGQLLPESLLNAPKLGAINVHASLLPKYRGGAPIHQAIID 
GEEETGITIMYMVKKL.DAGNIISQQSIRI EEEDNVGAMHDKLSFLGAELLKKTLPSI IDN 
TNDSIPQDDALATFASNIRREDERVDWNMSAQAIHNHIRGLSPWPVAYTTMNEKNLKLFS 
AFIVKGKKGNPGTI I ETTKHELI IATGSDDAIALTEIQPAGKKRMKVTDYLSGVQESLVG 
5 KVLL* 

Sequence 1473 

Contig_0608_pos_11121_9826, 
is similar to {with p-value 1.0e-92) 
10 >sp: sp| P94464 | SUN_BACSU SUN PROTEIN. >gp : gp I Z99112 | BSUB0009_ 
44 Bacillus subtilis complete genome (section 9 of 21): from 
1598421 to 1807200. NID: g2633902. >gp : gp | Y13937 | BSY13937_1 
0 Bacillus subtilis genomic DNA from the spoVM region. NID: 
g2337793. 

15 gtgcgaacatatgcattagaaacaatcaacgacgtcctaaataaaggggcttatagtaat 
ttgaaaattaatgaagttctatctacaaataacattaatacagtagataaaaatttattc 
acagaattagtatatggaacaataaaaagaaaatactcgttagattatctactaaagcct 
tttatcaaaactaaaatcaaatcatgggtgcgacaattactgtggatgagtttatatcaa 
tatttatatttagataaaatacctaaccatgctattattcatgaagcggtagatatagca 

20 aaaaaacgtggtggctatcacacagggaatatagtcaatggtgtattacgaacagtaatg 
cgcactgaattgccaagctttgaagatatagatgatactaaaaaaagaattgcaattcaa 
tatagtcttcccaaatggattgttgatcattgggttacacattttggagtagaaaaaact 
gaaaacattgcacgatcttttttagagcctgtaaccacaaccgtgcgcgccaatatatct 
cgtgattctattgattcaattatctctaagttagaacaggaaggttaccacgttaaaaaa 

25 gacgatatgt taccattttgtcttcatatatcaggtatgcctgtggttaattcaaacgct 
tttaaagaaggttatatctctattcaagataaaagttcaatgatggtagcttatgtaatg 
aacctagggcgagatgacaaagttttagatgcgtgcagcgcacctggtggtaaagcttgt 
catatggcagaaattctttcaccagaaggtcacgtcgatgcaacagatattcatgaacat 
aaaataaatcttataaagcaaaatattaaaaaattgaaattgaataatatcaaggctttt 

30 caacatgatgctacagaagtatacgataaaatgtatgataagattcttgttgatgcacca 
tgtagtggattaggtgttcttagacacaaacctgaaattaaatatagtcaatcacaaaat 
agcattaagtctttagtagaattacaattacaaattttagaaaatgttaaagataatatt 
aaacctggtggtacaatagtgtattcaacatgtacaatagaacaaatggaaaacgaaaat 
gtcatctatacttttttaaagagacataaagattttgagtttgaaccattccaaaatcca 

35 gcgactggtgaacaggttaaaacgttacagatacttccacaagattttaatt cggatgga 
ttctttattagcaagataaaaagaaaggaaagttag 

Sequence 1474 

VRTYALETINDVLNKGAYSNLKINEVLSTNNINTVDKNLFTELVYGTIKRKYSLDYLLKP 
40 FIKTKIKSWVRQLLWMSLYQYLYLDKI PNHAI I HEAVDIAKKRGGYHTGNI VNGVLRTVM 
RTELPSFEDI DDTKKRIAIQYSLPKWIVDHWVTHFGVEKTENIARSFLEPVTTTVRANIS 
RDSIDSIISKLEQEGYHVKKDDMLPFCLHISGMPVVNSNAFKEGYISIQDKSSMMVAYVM 
NLGRDDKVLDACSAPGGKACHMAEILSPEGHVDATDIHEHKINLIKQNIKKLKLNNIKAF 
QHDATEVYDKMYDKILVDAPCSGLGVLRHKPEIKYSQSQNSIKSLVELQLQILENVKDNI 
45 KPGGTIVYSTCTIEQMENENVI YTFLKRHKDFEFEPFQNPATGEQVKTLQILPQDFNSDG 
FFISKIKRKES* 

Sequence 1475 
Contig_0608_pos_8 722_7979, 
50 is similar to (with p-value 1.0e-26) 

>gp:gp| Z70722 |MLCB1770_13 Mycobacterium leprae cosmid B1770. 
NID: g2344819. 

atgttaaacgcacaatttttcactgatactgggcaacatcgtgagaaaaacgaggacgct 
ggcggtatattttacaatcaaacacagcaacaaatgctagtattatgcgatggcatgggt 
55 ggacatcaagctggagaaatagctagtcagtttgtcacttatgaacttcaaaagcgtttt 
gaagaagagaatctaattgaaataaatcgtgctgaatcgtggttgcgttcgaacattaaa 
gaaatcaattttcagctgtacaactatgctcaagaaaatgaagattacagaggtatgggt 
acaacgctcgtttgtgccatcatttatgacaaacaagttgttgtagcaaatgtaggagat 
tcgcgtgcttatgtaat taat cagagacagatggatcaaattacgagcgaccattcattt 
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gttaatcacttagtaatgactggacaaattactaaagatgaagcatttcatcatccacaa 
cgtaatattattactaaagtcatgggaacagataaacgtgtttctccagatttatttatc 
aagagaactcatttttatgattatcttcttttgaactctgacggacttactgattatgtc 
agagattatgaaatccaagaactactaagttcaaataattcattagacgtccatggtaat 
5 gagttattggacttagcgcttgcccatgattcaaaagataatgtcagctttatcctttta 
aagttagaaggtgataaagtatga 

Sequence 1476 

MLNAQFFTDTGQHREKNEDAGGIFYNQTQQQMLVLCDGMGGHQAGEIASQFVTYELQKRF 
10 EEENLIEINRAESWLRSNIKEINFQLYNYAQENEDYRGMGTTLVCAI I YDKQVVVANVGD 
SRAYVINQRQMDQITSDHSFVNHLVMTGQITKDEAFHHPQRNIITKVMGTDKRVSFDLFI 
KRTHFYDYLLLNSDGLTDYVRDYEIQELLSSNNSLDVHGNELLDLALAHDSKDNVSFILL 
KLEGDKV* 

15 Sequence 1477 

Contig_0608_pos_77 66_5979, 

is similar to (with p-value 4.0e-55) 

>gp:gp| Z70722 |MLCB1770_9 Mycobacterium leprae cosmid B1770. 
NID: g2344819. 

20 atgattgacgtagatgaggaagatgattgtttttatattgttatggaatacatagaaggt 
cctactttagcggaatatatacacagtcatggtccacttagtgtagaaactgctattcag 
tttacagaacaaattttgagtggaatcaaacatgcgcatgatatgagaattgttcatcgc 
gatattaaaccacagaatatattaattgataagaataaaaaattacaaatttttgatttt 
ggtattgctaaagcattgagtgaaacgtcgctgactcaaacaaatcatgttttaggaact 

25 gttcaatatttatcgcctgaacaggctaaaggcgaagctactgatgaaagtactgacata 
tattcaattggaattgtattatatgagatgttagttggagagccaccattcaatggagag 
actgcagtgagtattgctataaagcatattcaagatagtatcccaaatataacaacggat 
aaaagagatgatgtacctcaatcattgagtaatgtcgttttacgtgctaccgagaaagat 
aaatatcatcgctatcatactgttcaagaaatgtgtgacgatttaacaagtgctttacat 

30 gagaatcgtttgaatgaagaaaaatacgagctagataaaactaaaacggtaccactcact 
aaagatgatttgaatcataaaatctatgatgaacaaaatcaaaatgaccttaataaaacc 
atgcaaatacctattgttaacgattcaataaagcaacaagaatttcaatcgtctgaacca 
cgttattatcaaaaaagcgacaagaaacgttctaaaatgaaaattgcaattttatcaatc 
atttttgtaatattattaattggtttattttcttttgtagctatggctgtttttggaaat 

35 aaatatgaagaaatgcctgaccttaaagggaaaactgaaaaacaagctgaaaaggtatta 
gacagtcatcatctaaaagtaggtgacatatcaagaaattatagtgataaatatcctgaa 
aaccaaattattaaaacaaatccagatagtggagaacgcgtcgaacaagggaatagagtt 
gatatcgttctatccaaaggaccagagaaggttaagatgccaaatgttttaggtatgtcg 
aaaaatgatgcgctaaaaaaattaaaggctatcggatttaaagatattcacgttgagcaa 

40 gcttatagtcaaacatatgaaaaaggattaatttctgaacaaagcgttgtagctaatagt 
gaggttgccgttaataatcatcatattacaatttatgaatcattaggtgttcgacaagta 
tatgtcaataattatgaaaataagtcatatgagtcagcaaaaaaagaacttgaagataaa 
ggatttaaagttcaagtgacaaaagaaaacaacgatgatgtcgaaaaaggtaatgtcatt 
tctcaatctccaaaggaLaaaactgttgatgaaggttctactatactattagtggtttct 

45 aaaggagaaaagtctgaagaagaagatgatgaggaggacaaggatacaacgactaaaaat 
gagactgttaaagtaccgtataccggtaaaaaaagtaaaagtcaaaaagtagaagtattt 
attcgtgatattgaaaataaaggtgattcagcagttcaaacgtttaatattaaaagtgat 
aaaacaattaatattcctttgaaaattaaaaaaggaagtgacgctgggtacaccataaga 
gttgataataaaattgtagctgataaagatgtgagctatgatggctaa 

50 

Sequence 1478 

MIDVDEEDDCFYIVMEYIEGPTLAEYIHSHGPLSVETAIQFTEQILSGIKHAHDMRIVHR 
DIKPQNILIDKNKKLQI FDFGIAKALSETSLTQTNHVLGTVQYLSPEQAKGEATDESTDI 
YSIGIVLYEMLVGEPPFNGETAVSIAIKHIQDSIPNITTDKRDDVPQSLSNVVLRATEKD 
55 KYHRYHTVQEMCDDLTSALHENRLNEEKYELDKTKTVPLTKDDLNHKI YDEQNQNDLNKT 
MQIPIVNDSIKQQEFQSSEPRYYQKSDKKRSKMKIAILSIIFVILLIGLFSFVAMAVFGN 
KYEEMPDLKGKTEKQAEKVLDSHHLKVGDISRNYSDKYPENQIIKTNPDSGERVEQGNRV 
DIVLSKGPEKVKMPNVLGMSKNDALKKLKAIGFKDIHVEQAYSQTYEKGLISEQSVVANS 
EVAVNNHHITI YESLGVRQVYVNNYENKSYESAKKELEDKGFKVQVTKENNDDVEKGNVI 
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SQSPKDKTVDEGSTILLVVSKGEKSEEEDDEEDKDTTTKNETVKVPYTGKKSKSQKVEVF 
IRDIENKGDSAVQT FNIKSDKTINIPLKIKKGSDAGYTIRVDNKIVADKDVSYDG* 

Sequence 1479 
5 Contig_0608_pos_5685_4 795, 

is similar to (with p-value 4.0e-39) 

>sp: sp | P45339 I YJEQ_HAEIN HYPOTHETICAL PROTEIN HI1714 . >pir : p 
ir | B64176 | B64 176 hypothetical protein HI1714 - Haemophilus i 
nfluenzae (strain Rd KW20) >gp : gp I U32844 | U32844_6 Haemophilu 
10 s influenzae Rd section 159 of 163 of the complete genome. N 
ID: gl574563. 

atgagaggtgtctttttgaagactggtcgaatcgttaaactaatcagtggtgtgtatcaa 
gtagacgtagaaggtgagagatttgataccaaaccacgtggtttattcagaaaaaagaag 
ttttcacctgtggtgggcgatatcgtagattttgaagttcaaaatacaaaagagggctat 

15 attcatcatgtacatgaccgaaataatgaactaaaacgaccacctgtaagtaatattgac 
gagttagttatagtaatgagtgcagtcgagcctgaattttcaacacaattattagatcgc 
tatttagtgattgctcattcttatcatctcaaacctagaattttaatcactaaacatgat 
ttagcttccgaacaagaaattcttaaaatcaaagacacaataaaaatatatcaaaaaata 
ggatatgctacgcagtttattggaaaagatagtaattatactgctactgttgatgaatgg 

20 tctgacggtttaatagtattaagtggccaatctggagtgggtaaatctactttcttaaat 
agttatcagcctcagttgaagttagaaacaaatgatatttctaagtcattgaataggggt 
aaacatactacaagacatgtcgaattatacgatagaaaaggtggttacatcgctgataca 
ccggggtttagtgcgttagattttaatcatattgaaaaagaacaactaaaagattttttt 
attgatattcatgaagctggagagcaatgtaagtttcgtaattgtaatcatataaaagaa 

25 ccacaatgtcatgtcaaagcactcgttgaaaaaggagaaattccacaattcaggtatgat 
cattatcagcaattatataatgaaatttccaatagaaaggttcgatactaa 

Sequence 1480 

MRGVFLKTGRIVKLISGVYQVDVEGERFDTKPRGLFRKKKFSPWGDIVDFEVQNTKEGY 
30 IHHVHDRNNELKRPPVSNIDELVIVMSAVEPEFSTQLLDRYLVIAHSYHLKPRILITKHD 
LASEQEILKIKDTIKI YQKIGYATQFIGKDSNYTATVDEWSDGLIVLSGQSGVGKSTFLN 
SYQPQLKLETNDISKSLNRGKHTTRHVELYDRKGGYIADTPGFSALDFNHIEKEQLKDFF 
IDIHEAGEQCKFRNCNHIKEPQCHVKALVEKGEIPQFRYDHYQQLYNEISNRKVRY* 

35 Sequence 1481 

Contig_0609_pos_4 636_3806, 

is similar to (with p-value 2.0e-88) 

>sp:sp|P18156|GLPF_BACSU GLYCEROL UPTAKE FACILITATOR PROTEIN 
. >pir :pir (C47700 | C4770O glycerol uptake facilitator glpF pr 

40 otein - Bacillus subtilis >gp: gp j M99611 | BACGLPPFK_2 Bacillus 
subtilis antiterminator regulatory protein (glpP) , glycerol 
uptake facilitator (glpF) genes, complete cds, glycerol kin 
ase (glpK) gene, 5' end. NID: gl42995. >gp:gp | Z99108 | BSUB000 
5__196 Bacillus subtilis complete genome (section 5 of 21) : f 

45 rom 802821 to 1011250. NID: g2633055. >gp:gp| Z99109 I BSUB0006 
_3 Bacillus subtilis complete genome (section 6 of 21) : from 

999501 to 1209940. NID: g2633260. >gp: gp I Y14079 I BSY14079_3 
Bacillus subtilis chromosomal DNA, region 75 degrees: glpPFK 
D operon and downstream. NID: g2226133. 

50 atgtatatgaatgcttatttagcagaatttttaggtactgcaatccttattctttttggt 
ggtggcgtttgtgcaaacgttaacttaaagagaagtgctggtaacggtgcagattggatt 
gttattgcatttggttggggtttggcagtaacaatgggcgtttatgctgttggaacgttt 
tctggtgcacatttaaatccagctgtaacagttgctttagccatggatggtggatttagc 
tgggcgcaagtaccgggctatattgtttgtcaaatgcttggcggtattgttggtggagtt 

55 tttgtatggttaatgtatttaccacactggaaagttacagaagatccagcagtcaaatta 
ggtgtattttcaacagcaccagccat taaaaattattttgctaactttttaagtgagatt 
atcgggactatggctttaacattaggaattttatttatcggggttaataaaattgctgat 
ggtttaaatccaattattgttggtagtcttatcatagcaattggtttaagcttaggaggt 
ac tactggttacgctattaatccagcccgtgacctagcaccacgtattgcacatgct at t 
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ttgccaattcatggtaaaggtaaatctaactggtcttacgcaattgtacccgttctggga 
cccatggcaggtggtatgttaggtgcgattgtttacgaagtgttttataaacaaacattc 
aattttagttgtttcattggtttaattgtacttatattcacacttatacttggcgtgata 
ctaaataagatatctcaaaataaaaacaacgatattgaatcaatttattaa 

5 

Sequence 1482 

MYMNAYIJ\EFLGTAILILFGGGVCANVNLKRSAGNGADWIVIAFGWGLAVTMGVYAVGTF 
SGAHLNPAVTVALAMDGGFSWAQVPGYIVCQMLGGIVGGVFVWLMYLPHWKVTEDPAVKL 
GVFSTAPAIKNYFANFLSEIIGTMALTLGILFIGVNKIADGLNPIIVGSLI IAIGLSLGG 
10 TTGYAINPARDLAPRIAHAILPIHGKGKSNWSYAIVPVLGPMAGGMLGAIVYEVFYKQTF 
NFSCFIGLI VLI FTLI LGVI LNKI SQNKNNDI ES I Y * 

Sequence 1483 
Contig_0609_pos_3652_2153, 

15 is similar to (with p-value 0.0e+00) 

>sp:spj P18157|GLPK_BACSU GLYCEROL KINASE (EC 2.7.1.30) (ATP: 
GLYCEROL 3 -PHOSPHOTRANSFERASE) (GLYCEROKINASE) (GK) . >pir:pi 
r I B45868 | B45868 glycerol kinase (EC 2.7.1.30) - Bacillus sub 
tilis >gp:gp|M34 393|BACGLPKD_2 B.subtilis glycerol kinase (g 

20 lpK) and glycerol -3-phosphate dehydrogenase (glpD) genes, co 
mplete cds . NID: gl42990. >gp: gp I Z99108 I BSUB0005_197 Bacillu 
s subtilis complete genome (section 5 of 21) : from 802821 to 
1011250. NID: g2633055. >gp : gp | Z99109 | BSUB0006_4 Bacillus s 
ubtilis complete genome (section 6 of 21) : from 999501 to 12 

25 09940. NID: g2633260. >gp: gp | Y14079 | BSY14079_4 Bacillus subt 
ilis chromosomal DNA, region 75 degrees: glpPFKD operon and 
downstream. NID: g2226133. 

atggaaaaatatattttatcaattgatcaaggaactacgagttcacgtgcgatacttttt 
aataaagaaggagaaattaaaggtgtttctcaaagagaatttaaacaacactttccacat 

30 ccaggctgggtagaacatgatgctaatgaaatatggacatctgttctatcagttatggct 
gagttacttaatgaaaacaatattaatgcaaatcaaattgaaggtattggtattacaaac 
caacgtgaaacgacag ttgtatgggataaaaatacaggtcgtccaatctatcacgctatc 
gtttggcaatcacgtcagacacaagatatttgtacaaatttaaaggaacagggttatgaa 
gaaacatttagagaaaaaacaggtttacttttagacccgtactttgcgggaactaaagta 

35 aaatggattcttgatcatgttgaaggtgctagagaaaaagctgaaaatggtgatttactc 
ttcggaacaatcgattcatggttagtatggaaattgtcaggacgtactgctcatattaca 
gattacactaatgcaagtcgtacattaatgtttaatatttatgacctaaaatgggatgat 
gagttgttagaacttttaaatattcctaaacaaatgttacctgaagttaaagaatcaagt 
gaaatttacgggaaaactatcgactat cacttctttggtcaagaagtacctattgctggt 

40 attgccggtgaccaacaagcagcattatttggtcaagcatgttttgaccgtggtgatgta 
aaaaatacatacggcacaggtggatttatgctaatgaatactggtgaagaagcagttaag 
tcagaaagtggcttgttaacaaccattgcatacggtttagatggaaaagttaattatgca 
cttgaaggttcaattttcgtatctggttctgctatccaatggctacgagatggtttgaga 
atgattaattctgcgccacaaaccgaaaactatgcttcaagagtagagtcaactgagggt 

45 gtttatatggttccagcatttgttggtttaggtacaccttattgggattcagaagcaaga 
ggtgctattttcggattatctcgtggtacggaaaaagaacatttcattcgtgctacatta 
gaatctttgtgctatcaaacaagagatgttatggaagctatgtctaaggactcaggtatt 
gaagttcaaaatttacgcgttgatggtggtgctgtaaaaaataacttcattatgcagttc 
caagcagatatcgtaaattcatctgttgaaagacctgaaatccaagaaacaacagcactt 

50 ggtgctgcatatttagctggattagctgttggattctgggatgataaagaggatatccgt 
gaacgttggaaacttcaaactgagttcaaaccagaaatggatgcagatcaacgtcataaa 
ctttatagtggttggaaaaaagctgttaaggcgactcaagtatttaaattagaagattaa 



55 Sequence 1484 

MEKYILSIDQGTTSSRAILFNKEGEIKGVSQREFKQHFPHPGWVEHDANEIWTSVLSVMA 
ELLNENNINANQIEGIGITNQRETTVVWDKNTGRPI YHAI VWQSRQTQDICTNLKEQGYE 
ETFREKTGLLLDPYFAGTKVKWILDHVEGAREKAENGDLLFGTIDSWLVWKLSGRTAHIT 
DYTNASRTLMFNI YDLKWDDELLELLNIPKQMLPEVKESSEI YGKTIDYHFFGQEVPIAG 
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IAGDQQAALFGQACFDRGDVKNTYGTGGFMLMNTGEEAVKSESGLLTTIAYGLDGKVNYA 
LEGSIFVSGSAIQWLRDGLRMINSAPQTENYASRVESTEGVYMVPAFVGLGTPYWDSEAR 
GAI FGLSRGTEKEHFI RATLESLCYQTRDVMEAMSKDSGIEVQNLRVDGGAVKNNFIMQF 
QADIVNSSVERPEIQETTALGAAYLAGLAVGFWDDKEDIRERWKLQTEFKPEMDADQRHK 
5 LYSGWKKAVKATQVFKLED* 

Sequence 1485 

Contig_0609_pos_1976_303, 

is similar to (with p-value 0.0e+00) 

10 >sp: spi P18158 |GLPD_BACSU AEROBIC GLYCEROL- 3 -PHOSPHATE DEHYDR 
OGENASE (EC 1.1.99.5). >pir :pir | C45868 IC45868 glycerol-3-pho 
sphate dehydrogenase (EC 1.1.99.5) - Bacillus subtilis >gp:g 
p|M34393|BACGLPKD_3 B. subtilis glycerol kinase (glpK) and gl 
ycerol-3-phosphate dehydrogenase (glpD) genes, complete cds . 

15 NID: gl42990. >gp : gp { Z99108 I BSUB0005_198 Bacillus subtilis 
complete genome (section 5 of 21): from 802821 to 1011250. N 
ID: g2633055. >gp: gp I Z99109 I BSUB0006_5 Bacillus subtilis com 
plete genome (section 6 of 21): from 999501 to 1209940. NID: 
g2633260. >gp: gp I Y14079 I BSY14 079_5 Bacillus subtilis chromo 

20 somal DNA, region 75 degrees: glpPFKD operon and downstream. 
NID: g2226133. 

atgtcattatctacattgaaaagggatcatattaaaaagaatttaagagacactgaatac 
gatgttgttatcgtagg tggcggtattacaggtgcaggtattgctttagatgcaagtaat 
cgtgggatgaaggtagctttagtagagatgcaagactttgcacaaggtacaagttcacgc 

25 tcaactaaacttgtacacggtggtttaagatatttaaaacaactgcaagtaggggtagtt 
gcagaaacaggtaaagaacgtgctattgtttatgaaaatggtccacatgtgacaacacca 
gaatggatgcttttacctatgcataaaggtggtacatttggtaaattctcaacttctatt 
ggactagctatgtacgatcgtctagctggtgtcaaaaaatccgaacgtaaaaaaatgtta 
tctaagcaagaaacgttaaataaagaacctttagttaaacgtgatggattaaaaggcggt 

30 ggctactatgtggaataccgcactgatgatgcgcgtttaactattgaagttatgaaaaaa 
gctgctgaaaatggagcagaaatcattaattatacaaaatcagaacacttcacttatgat 
tccaataagaaagtaaatggtattgaagtattggatatgattgatggcgaaacgtatgcg 
attaaagctaaaaaagttattaatgcttctggtccttgggttgatgaagtgagaagtggc 
gattatgcacgtaacaataagcaattaagattaactaaaggtgtacacgttgttatagat 

35 caa tctaaattcccattaggtcaagcagttt act ttga tact gaaaaagacggacgcatg 
atttttgcgattccacgtgaaggaaaagcttatgtaggaacaactgacacgttttatgat 
aatgaaaaagcaacacctttaacaacacaagaagatagagactacttaattaatgcaatt 
aactatatgttcccaacagttaatgttaaagatgaagatattgaatcaacatgggctggt 
attcgtccgctaattcttgaaaaaggtaaagatccttctgaaatctcacgtaaagatgaa 

40 gtttgggaaggtgaatctggattattaactatagcaggcggtaaattaactggttatcgt 
catatggcactagaaattgttgatttattagctaaacgtttaaaacaagaatacggattg 
aaatttgaatcatgtgccacaaaaaatctaaaaatttccggtggtgacgttggcggaagc 
aaaaactttgaacactttgttgaacaaaaagttgatgcagctaaaggatttggaattgat 
gaagatgtggcacgtcgcttagcaagtaaatatggttcaaatgttgatcaactatttaat 

45 attgctcaaacggcaccatatcatgatagtaaattaccattagaaatttatgttgaatta 
gtttatagtattcaacaagaaatggtttacaaaccaactgacttcttagtacgtcgttct 
ggcaaattatactttaatattcaagatgtgttagattataaaaatgctgtgatagatgtt 
atggcggatatgcttaattatagtgaaactcaaaaagaagcttatactgaagaagtagaa 
gttgcgattgatgaggcacgtacaggtaatgatcaacctgcaactaaagcttaa 

50 

Sequence 1486 

MSLSTLKRDHIKKNLRDTEYDVVIVGGGITGAGIALDASNRGMKVALVEMQDFAQGTSSR 
STKLVHGGLRYLKQLQVGWAETGKERAIVYENGPHVTTPEWMLLPM.il KGGTFGKFSTSI 
GLAMYDRLAGVKKSERKKMLSKQETLNKEPLVKRDGLKGGGYYVEYRTDDARLTIEVMKK 
55 AAENGAEI INYTKSEHFTYDSNKKVNGIEVLDMIDGETYAIKAKKVINASGPWVDEVRSG 
DYARNNKQLRLTKGVHVVIDQSKFPLGQAVYFDTEKDGRMIFAIPREGKAYVGTTDTFYD 
NEKATPLTTQEDRDYLINAINYMFPTVNVKDEDIESTWAGIRPLILEKGKDPSEISRKDE 
VWEGESGLLTIAGGKLTGYRHMALEIVDLLAKRLKQEYGLKFESCATKNLKISGGDVGGS 
KNFEHFVEQKVDAAKG FGIDEDVARRLASKYGSNVDQLFNIAQTAPYHDSKLPLEI YVEL 
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. VYSIQQEMVYKPTDFLVRRSGKLYFNIQDVLDYKNAVIDVMADMLNYSETQKEAYTEEVE 
VAIDEARTGNDQPATKA* 

Sequence 14 87 
5 Contig_0610_pos_767_1225, 

putative peptide of unknown function 

atgacagactcaaatgctaaagaaataagaactggacgtttaattgcgataagttcatta 
gtgttttgtattttacttatcatacaccactttattgtattagatgaatcaacagctaaa 
tcaattttatctttagctggtcaaaaaacatcagatacagcagtgaaaaacattttaaat 
10 agtgaccgatacactggaattatgtatattttagcttacttagcaggtactgttgctttc 
tggaatcgccatccatatttatggtggtttatgtttgccgtatatatttctaatgcacta 
tttacactcgtaaatctttacttatttattcaaggtattttagatgtaaaaaatgtactt 
gcagttttaccaattttaattgtagtgattggatctataattctagcaatttatatgcta 
gttgtttctattacacgtaaaagtactttcaatagatag 

15 

Sequence 1488 

MTDSNAKEIRTGRLIAISSLVFCILLI IHHFIVLDESTAKSILSLAGQKTSDTAVKNILN 
SDRYTGIMYILAYLAGTVAFWNRHPYLWWFMFAVYISNALFTLVNLYLFIQGILDVKNVL 
AVLPILIVVIGSIILAIYMLVVSITRKSTFNR* 

20 

Sequence 1489 

Contig_0610_pos_2605_334 5, 

is similar to (with p-value 4.0e-84) 

>sp:sp|Q06174 I EST_BACST CARBOXYLESTERASE PRECURSOR (EC 3.1.1 
25 .1). >pir : pir I JC137 4 I JC137 4 carboxylesterase (EC 3.1.1.1) - 
Bacillus stearothermophilus (strain IFO 12550) >gp: gp | D12681 
|BACPBH7_1 Bacillus stearothermophilus esterase gene. NID: g 
216313. 

atgcaaattaaactaccaaaaccattcttttttgaagaagggaaacgtgcagtgttactt 
30 cttcacggctttacaggtaactctgctgatgtaagacaacttgggcgttatcttcaaaaa 
aagggctatacatcttatgctccacaatatgaaggacatgcagcgcccccagaagaaata 
ttaaaatctagcccttttgtttggtttaaagatgttttagatggttatgattatttagta 
gatcaaggttacgaagaaatagcagtagctggtttatcattaggtggcgccttcgcatta 
aaactaagtttaaatcgtgatgtgaaggggattataactatgtgtgcacctatggagaat 
35 aaaacagaaggttcgatttatgaaggctttcttgaatatgcacgtaactttaaaaaatat 
gaaggcaaagatcaacaaacgattgatcaagaaatggaacaatttcatccaactgaaacc 
ctgagagaactgagtqacactctaaatggagttaaagaacatgtcgatgaagtaattgat 
ccaatacttgtcgtacaagcagaacaagatacaatgattgatcctcaatcagcaaattat 
atatataatcatgtcgattctgatgaaaaagaaatcaaatggtatcaacattcaggtcat 
40 gtgattaccattgataaagaaaaagagaaagtctttgaagatgtatatcaatttttagaa 
tcattggaatggacagagtaa 

Sequence 14 90 

MQIKLPKPFFFEEGKRAVLLLHGFTGNSADVRQLGRYLQKKGYTSYAPQYEGHAAPPEEI 
45 LKSSPFVWFKDVLDGYDYLVDQGYEEIAVAGLSLGGAFALKLSLNRDVKGIITMCAPMEN 
KTEGSIYEGFLEYARNFKKYEGKDQQTIDQEMEQFHPTETLRELSDTLNGVKEHVDEVID 
PILVVQAEQDTMIDPQSANYIYNHVDSDEKEIKWYQHSGHVITIDKEKEKVFEDVYQFLE 
SLEWTE* 

50 Sequence 1491 

Contig_0610_pos_3380__5758, 

is similar to (with p-value 0.0e+00) 

>sp:sp| P4 4 907|VACB_HAEIN VACB PROTEIN HOMO LOG . >pir : pir | G64 0 
98IG64098 virulence associated protein homolog (vacB) homolo 
55 g - Haemophilus influenzae (strain Rd KW20) >gp : gp | U32767 | U3 
2767_9 Haemophilus influenzae Rd section 82 of 163 of the co 
mplete genome. NID: gl573868. 

atgaatttaaagcaatccatcgaagaaatgataaaacaacctgactatgaacccatgtca 
gtatctgactttcaagatgcgttaggtttaaacagtgccgactcatttagagatttaatt 
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aaaatactcgttgaattagaacagtctggtttaattgaacgtacaagaacagacagatat 
caacgtaaacaatccaacaaaacaaattcaaaactaatcaaaggaacgttaagtcaaaat 
aaaaaaggctttgctttcttaagacctgaagatgacgagatggatgatatttttattcca 
ccaactaaaatcaatagagcattagatggagatactgtcatcgtggaaattcaaaaatct 
5 cgtggagaacataaaggtaaaattgaaggtgaagtaaaatctattgaaaagcattcagtt 
acacaagttgttggaacgtatagcgaagcaaagcattttggtttcgtattaccggatgac 
aaacgtattatgcaagatatctttatacctaaaggacaaaatttaggtgctgtagatggt 
cataaagtattagtacaaattacgaagtatgccgatagtactgacaatccagaaggccac 
gtctcagcaatattaggtcataaaaatgatccaggtgtagatatactttccatcatttac 

10 cagcatggaatagaaatcgagtttccagatgatgtattacaagaagctgaagaagtaccg 
gatgtaatagaaccatctgaaatcgaagggcgtcgtgatttaagagatgaattaacaatc 
actatagatggcgcagatgctaaagatttagatgatgccattgctgtaaaaaaattaaaa 
aatggcaacaccgagcttacagttagtattgcagatgtaagttactatgtaaaagaagga 
tcagctttagataaagaagcttatgatcgtgcgacaagtgtgtatcttgtcgatcgagta 

15 atcccgatgattccacaccgtctaagtaatggaatatgctcattaaatccagaagaagat 
cgtttaacattaagttgtcgaatggaaataaatgaacgaggcgaagttgtaaaacatgaa 
atctttgatagtgtaatacattcaaactacagaatgacatatgatgcagttaacaaaatt 
atcactgatcaagattctgaaatacgttcacaatataaagatttaacacctatgt tagat 
ttagcgcaagatttatcaaatagattaattcatatgcgcaaacgtcgtggagaaattgat 

20 tttgatattaatgaagcgaaagtacttgtgaatgacgaaggtattccaacagaagtgcta 
atgagagaacgtggcgaaggagaacgtttaattgaatcattcatgttagcagccaatgaa 
acagtagctgaacacttcaataaattggaagtaccatttatttatcgtgttcatgaacaa 
ccaaaatctgaccgattaagacagttcttcgactttattaccaatttcggtattatgata 
aaaggtacaggtgaagatattcatccaacaacattacaaagcattcaagaagaagttgaa 

25 ggtagaccagaacaaatggttatttcaacgatgatgttacgttctatgcaacaagcacat 
tatgatgatgttaatttaggacattttggtttgtctgctgagtactatactcact ttacg 
tctccaatacgccgttatcctgatttaacagtgcatagattaattcgtaaatatttaata 
gagaattctatggataaaaaagaaatacgtcattgggaagagacgttgccagaattagct 
gagcacacatcacaacgtgaacgccgtgccattgaagccgaacgtgatactgatgaattg 

30 aaaaaagctgagtatatgattcaacatattggtgatgaatttgaaggtatcattagctcg 
gttgctaattttggtatgtttatagaattacctaatactattgagggtatggttcatatc 
gctaatatgacagacgattattatcattttgatgaacgacaaatggcactaatcggtgaa 
cgtcaagcaaaggtctttcgcattggtgatacggtcaaagttaaagtgacacatgttgat 
gtggaagaacgcatgatagatttccaaattgttggcatgccattacctaaaaagacatca 

35 tcacaacgacctgctcgtgagaaaaccattcaagctaaaacacgtggcaagtcgttagac 
cacactaaaagtgatcgtaatggtaaaggtaaaaagaaaaaacgtaagcaacgtaaaggt 
aaaaatgcacgtaaaaaagataaacaaggtaatacgcatcacaaacctttttataaagat 
aaaagtgtgaagaagaaatcgcgtcgaaagaaaaaatag 

40 Sequence 14 92 

MNLKQSIEEMIKQPDYEPMSVSDFQDALGLNSADSFRDLIKILVELEQSGLIERTRTDRY 
QRKQSNKTNSKLIKGTLSQNKKGFAFLRPEDDEMDDIFIPPTKINRALDGDTVIVEIQKS 
RGEHKGKIEGEVKSIEKHSVTQVVGTYSEAKHFGFVLPDDKRIMQDIFIPKGQNLGAVDG 
HKVLVQITKYADSTDNPEGHVSAILGHKNDPGVDILSIIYQHGIEIEFPDDVLQEAEEVP 

45 DVIEPSEIEGRRDLRDELTITIDGADAKDLDDAIAVKKLKNGNTELTVSIADVSYYVKEG 
SALDKEAYDRATSVYLVDRVI PMIPHRLSNGICSLNPEEDRLTLSCRMEINERGEVVKHE 
IFDSVIHSNYRMTYDAVNKIITDQDSEIRSQYKDLTPMLDLAQDLSNRLIHMRKRRGEID 
FDINEAKVLVNDEGIPTEVLMRERGEGERLIESFMLAANETVAEHFNKLEVPFIYRVHEQ 
PKSDRLRQFFDFITN FGIMIKGTGEDIHPTTLQSIQEEVEGRPEQMVISTMMLRSMQQAH 

50 YDDVNLGHFGLSAEYYTHFTSPIRRYPDLTVHRLIRKYLIENSMDKKEIRHWEETLPELA 
EHTSQRERRAIEAERDTDELKKAEYMIQHIGDEFEGIISSVANFGMFIELPNTIEGMVHI 
ANMTDDYYHFDERQMALIGERQAKVFRIGDTVECVKVTHVDVEERMIDFQIVGMPLPKKTS 
SQRPAREKTIQAKTRGKSLDHTKSDRNGKGKKKKRKQRKGKNARKKDKQGNTHHKPFYKD 
KSVKKKSRRKKK* 

55 

Sequence 14 93 

Contig_0610_pos_5783_6253, 

is similar to (with p-value 2.0e-48) 

>sp:sp| P43659|SMPB_ENTFA SMALL PROTEIN B HOMOLOG. >gp:gp|M90 
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060|STRATPASEA_1 Streptococcus faecalis H+ ATPase a (atpB),b 
(atpF),c (atpE), alpha (atpA) , beta (atpD) , gamma (atpG),delt 
a (atpH) f and epsilon (atpC) subunits, complete cds. NID: gl5 
3565. 

5 gtggctaagaaaaaatcaaaatcaccaggtacgttagctgaaaatcgtaaagcaagacat 
gactacaatattgaagatacaattgaagcgggtattgctttaagaggtactgaaattaaa 
tctatacgtcgtggtagtgccaatttaaaagatagctttgcgcaagtgagacgaggcgaa 
atgtacctgaataatatgcatattgcaccatatgaagaagggaaccgttttaatcatgac 
cctttacgtacacgtaaattactcttgcacaaaaaagaaattcaaaaattaggtgagcgt 
10 acacgagaaataggttattctattattccgttgaagttatatttaaaacatggtcaatgt 
aaagttttattaggcgttgctagaggtaaaaagaaatacgacaaacgtcaagcacttaaa 
gaaaaagcggtaaaacgagatattgatcgcgcagttaaagcccgttattaa 

Sequence 14 94 

15 VAKKKSKSPGTLAENRKARHDYNIEDTIEAGIALRGTEIKSIRRGSANLKDSFAQVRRGE 
MYLNNMHIAPYEEGNRFNHDPLRTRKLLLHKKEIQKLGERTREIGYSIIPLKLYLKHGQC 
KVLLGVARGKKKYDKRQALKEKAVKRDIDRAVKARY* 

Sequence 14 95 
20 Cont ig_0 6 1 2_pos_2 2 9 0_2 928, 

is similar to (with p-value 2.0e-20) 

>gp:gp| AJ007 319 ILM0346165 Listeria monocytogenes ascB, inlG 
, inlH, inlE r dapE genes. NID: g3980132. 

atgccacacctaggtacaaatgctgttgatattttagttgattttgtaaatgaaatgaaa 
25 caagaatataaaaatattaaagaacatgataaagtacacgagttagacgctgttccaatg 
attgagaaacat ctccacagaaaaattggt gaagaagaatcacatatctactctggatt t 
gtaatgttaaactctgtattcaatggtggtaaacaagttaattctgttcctcataaagcg 
acagctaaatataatgtaagaactgttccagaatatgacagtactttcgtgaaggattta 
tttgaaaaagtcattcgtcatgtgggcgaagattatttaactgtagatatacctagcagt 
30 cacgatccagtggcaagtgatcgtgataatcctcttattcaaaatattacacgtattgca 
ccgaattatgtacatgaagacattgttgtgagtgcattgattggtacaactgatgcatct 
agtttcctaggaacaaatgaaaataacgtggattttgctgtctttggacctggtgaatct 
attatggcgcatcaagttgatgaatttattagaaaagatatgtatttaagttacatcgat 
gtttataaagatgtatttaaagcatatctagaaaaataa 

35 

Sequence 1496 

MPHLGTNAVDILVDFVNEMKQEYKNIKEHDKVHELDAVPMIEKHLHRKIGEEESHIYSGF 
VMLNSVFNGGKQVNSVPHKATAKYNVRTVPEYDSTFVKDLFEKVIRHVGEDYLTVDIPSS 
HDPVASDRDNPLIQNITRIAPNYVHEDIVVSALIGTTDASSFLGTNENNVDFAVFGPGES 
40 IMAHQVDEFIRKDMYLSYIDVYKDVFKAYLEK* 

Sequence 14 97 

Contig_0612_pos_9229_10425 / 
putative peptide of unknown function 

45 atggttaaatttatacactgtgctgatttgcatttggacagtcctttcaaatctaaaagt 
tatcttagtccaaatatttttgaagatgtccaaaagagtgcatatgaaagttttaaaaac 
atagtcgacttagctttaaaacaggaagtcgattttattattatagcaggtgatttattt 
gatagtgagaatcgtacattgcgtgctgaagtctttttaaatgaacaatttgaaagatta 
agaaaagaacaaatatttgtttatatttgccatggcaaccacgatcctcttacttctaaa 

50 ataacaagtcagtggcctaataacgtatccgtattttcaaatcaagtagagacatatcaa 
gctatcactaaatcaggagaaacaatttatattcatggattcagctatcaaaatgatgcg 
agttatgaaaataaaatagacgcatacccatcaagtcaaggtcagaagggcatacatatt 
ggtgtattacatggaacttatagtaaatcttcggtgaaagaccgttatactgaatttagg 
ttagaagacttaaatcaacgtttataccactactgggcattaggacatatacaccaacgt 

55 gaacagttaagtgacatgccagtcattaactatccaggtaatatccaaggaagacatttc 
aa t gaa t t aggagaaa a aggttgtct at tggtcgaaggtgatcatcttaaactcact a ca 
caattttatcctactcaatttattaaatttgaagaagctacaattgaaactgatcataca 
tctaaacaaggactttatgatgttattcaatcttttaaagataaagtaagaactgaaggg 
aaatcattttatagattgaacgtacgcattaatagtgaagacattattgcaccacaagat 



379 



WO 01/34809 



PCT/USOO/30782 



ttaattcaattaaaagaaatgattactgagttcgaagaaaacgaaaatcaatttgttttt 
attgaagatttaaatcttcaatatgttcaaaatgacgaaatgccaatagttaaagagttt 
tcaccagaattacttgatgatgcgtcactttttgattcggcaatgactgatttatatctt 
aatccaagggcttctaagtttttagatgactataatgaatttgataaagttgagttagtc 
5 aatcatgcagaaagacttttaaaggatgaaatgagaggtgaacaaaatgataattaa 

Sequence 1498 

MVKFIHCADLHLDSPFKSKSYLSPNI FEDVQKSAYESFKNI VDLALKQEVDFI IIAGDLF 
DSENRTLRAEVFLNEQFERLRKEQIFVYICHGNHDPLTSKITSQWPNNVSVFSNQVETYQ 
10 AITKSGETI YIHGFSYQNDASYENKIDAYPSSQGQKGIHIGVLHGTYSKSSVKDRYTEFR 
LEDLNQRLYHYWALGHIHQREQLSDMPVINYPGNIQGRH FNELGEKGCLLVEGDHLKLTT 
QFYPTQFIKFEEATIETDHTSKQGLYDVIQSFKDKVRTEGKSFYRLNVRINSEDIIAPQD 
LIQLKEMITEFEENENQFVFIEDLNLQYVQNDEMPIVKEFSPELLDDASLFDSAMTDLYL 
NPRASKFLDDYNEFDKVELVNHAERLLKDEMRGEQNDN* 

15 

Sequence 14 99 

Contig_0612_pos_11117_13354, 

is similar to (with p-value 0.0e+00) 

>gp: gp I U21636 I SAU21636_1 Staphylococcus aureus cmp-binding-f 

20 actor 1 (cbfl) and ORF X genes, complete cds . NID: g710420. 
atgcatgagcaaaaacaaaaagaggttgctctacacgatcaaacacaagaatggaaaagg 
ttagaacagtcgcttaatatagagcctataaattttcctgaaaaagggatagatagatac 
gaaactgctaaatctcacaaacaatcacttgaacgagataaaagtttgcgagaagaaaga 
ttaagcatattaaataaagaggcggagtccatcaatccagtagaccaaaagtatattgat 

25 tcgtttaatagcctttatcaacaagagactgaaattaaacaaaaagaatttgagttacgt 
tcaatagagaaagatattgctgataagcaacgtgaactagaagctcttcaatctaatata 
ggttggcaagaagtgttttacgacacagacagtactgaagcgatgaaaagtcatatgagt 
gatttagtattaggcaagcaagaacaaattgcttatatcaatcagttagaacgtggactt 
gaagaaaataaaattgaaagaaactctaattctaatgagattaatcaagttgagaatgag 

30 cttgttcctgacgaaacctttgaaaagaaaaaggaatatacacaacaagttttagaatta 
catgaaaaagagaacttgtatgaaaagttaaaagaaacttttgaagaagaacaaacacaa 
aaaaataaaagacaaaagtttttgagaataggatttattgttttgactattctatcagca 
gcactttctattttttcttttttcactgcaaatcttatttttggtataatatttgctcta 
ttaactgtgatttttgtagtaggtatcattttttctagatctaaagcagtagattatagc 

35 acagcaataagtcaggaaattaatgatttagaaaaccaactcacgcaacttgaaaaagaa 
tataatcttgacttcgatttagaatatcaacaacaagttcgtgaacaatggcgtcatgct 
aaaaaaaataaaaaaatacttgaagaaaaacatcaatatatcaatcaatcattaacgact 
gcaaatgagcgattagatagtttaaaacatagcattattgaaataaaaaaagagttacgt 
ttatcagaaaaactttctgatgaattagtggttgaaagtatctcaaccattggtcaaatt 

40 aaagcgcatgataaatacattattgatttaaatcaacaacgcaataatctgctaaaagat 
atcaatcacttttatgaacgtgcacaatctgtaactgaaccacatttaaaactatttaat 
cagatgtctttcttccatgatgtgaaacagtggttaaaaaatgcagaagaacaaaatgag 
get tggaataaaaatcaaactgaaacgcaat tact caataatgaattaaagcaattgaag 
tcacgcttaagtgaaacgaatcaaatgattaagcaattatttgattatgttgatgtagat 

45 aatgaagaagat tat tat acacatcatcatcattttgaaaca tat caaagtgat ttaaat 
cgatttaatgatttaaatcaatatttagaaaatcaaaattacacttatgaaatgagttcg 
caattaagtgagaaaactactgctcaactagaagaagaagatcatagattggctaaacaa 
gttgacgattacaatgatcaatttttagaaatgcaagcagaagttagtgatttaaatgct 
cagattaatcatatggaaacagatagaactttagcacaattaagacatgaatattatagc 

50 ttaaaaaatagacttaacgatattgctaaggattgggcaagct taagttatatgcaagct 
ttagtggaagaacatatcaagcaaataaaagataagcgtctaccacaagtgattaatgaa 
gctgtatctatttttaaaaat t taacaaatggtacttacaatat gat teat tat act gaa 
aatcataaaatacatgtaaagcattctaacggacaagtatttgagccagttgagttgagt 
caatctacaaaagaattattatatgtggctttacgtattagtcttattaaagtattaaaa 

55 ccgtattatccattcccagtgattgtagatgatgcatttgttcattttgataaatatcgt 
aaagaacgtatgttgaaatatttgagagaactatcagaacattatcaaatactttatttt 
act tgtacaaaaga teat gtcataccggctaaagaagtattaact ttaaat aaattacag 
gaaggcgggaaaaaatga 
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Sequence 1500 

MHEQKQKEVALHDQTQEWKRLEQSLNIEPINFPEKGIDRYETAKSHKQSLERDKSLREER 
LSILNKEAESINPVDQKYIDSFNSLYQQETEIKQKEFELRSIEKDIADKQRELEALQSNI 
GWQEVFYDTDSTEAMKSHMSDLVLGKQEQIAYINQLERGLEENKIERNSNSNEINQVENE 
5 LVPDETFEKKKEYTQQVLELHEKENLYEKLKETFEEEQTQKNKRQKFLRIGFIVLTILSA 
ALS I FS FFTANL I FGI I FALLT VI FWGI I FS RSKAVDYSTAI SQE I NDLENQLTQLEKE 
YNLDFDLEYQQQVREQWRHAKKNKKILEEKHQYINQSLTTANERLDSLKHSIIEIKKELR 
LSEKLSDELWESISTIGQIKAHDKYIIDLNQQRNNLLKDINHFYERAQSVTEPHLKLFN 
QMSFFHDVKQWLKNAEEQNEAWNKNQTETQLLNNELKQLKSRLSETNQMTKQLFDYVDVD 
10 NEEDYYTHHHHFETYQSDLNRFNDLNQYLENQNYTYEMSSQLSEKTTAQLEEEDHRLAKQ 
VDDYNDQFLEMQAEVSDLNAQINHMETDRTLAQLRHEYYSLKNRLNDIAKDWASLSYMQA 
LVEEHIKQIKDKRLPQVINEAVSIFKNLTNGTYNMIHYTENHKIHVKHSNGQVFEPVELS 
QSTKELLYVALRISLIKVLKPYYPFPVIVDDAFVHFDKYRKERMLKYLRELSEHYQILYF 
TCTKDHVI PAKEVLTLNKLQEGGKK* 

15 

Sequence 1501 

Cont ig_0 61 2_pos_l 338 7_1 38 90 , 

is similar to (with p-value 1.0e-81) 

>gp:gp I U21636 I SAU21636_2 Staphylococcus aureus cmp-binding- f 
20 actor 1 (cbfl) and ORF X genes, complete cds . NID: g710420. 
gtggatcattttttcttgatccatcgtgcaactcaaggtgttacagctcagggtaaagat 
tacatgacactatttctgcaagataaaagtggtgatattgaagctaaattatggactgct 
acgaaagatgatatgcaaactttaaaaccagaaacaatagttcatgtcaaaggtgatatc 
atcaattatcgtggacgcaaacagatgaaaatacatcaaatacgtcttgcacaagctgaa 
25 gacaaagtgtcaactaaagactttgttgacggtgcgccaatgtcacctacagaaatacaa 
gaggaattatcgcattttatgttagatattgaaaatgctaacttacaacgcattactaga 
catttaattaaaaagtatcaagatcgttttt tcacttatccagcagctagttctcatcat 
cataatttcgcgagtggattgagttatcatgttttaacaatgttgcgtatagcaaaatct 
gtatgtgatatttatcctctgtga 

30 

Sequence 1502 

VDHFFLIHRATQGVTAQGKDYMTLFLQDKSGDIEAKLWTATKDDMQTLKPETIVHVKGDI 
INYRGRKQMKIHQIRLAQAEDKVSTKDFVDGAPMSPTEIQEELSHFMLDIENANLQRITR 
HLIKKYQDRFFTYPAASSHHHNFASGLSYHVLTMLRIAKSVCDIYPL* 

35 

Sequence 1503 

Con t ig_0 6 1 2_pos_l 67 4 6_1 7183, 

is similar to {with p-value 4.0e-52) 

>pir : pir I JC2527 | JC2527 alkaline shock protein - Staphylococc 
40 us aureus >gp:gp | S76213 I S76213_l asp23=alkaline shock protei 
n 23 {methicillin resistant) [Staphylococcus aureus, 912, Ge 
noraic, 1360 nt] . NID: g894288. 

atgctcaagtgcttttttgatatacatcaagagattgattattttaaaccttctctttca 
ttgttttctttgttatctttttcatttttttgttgccattctttttttgtcattacgtcg 

45 tcaacttgcatgttaacttcaacaacttctaaaccagtaatatattttacttgttcttta 
actaagtctgtcactttacggaaaattttaggtgcagattcaccatattctaaaataact 
tttaaatctacagcagcttgtttttctccaacttctacagatacgcctgtagttacattg 
ttaccgtttgagaaagcgttagtaaagctatctgtgaagccacctttcatgtctaaaatt 
cctttaacttcacgtgctgcaatacctgcaattttttcaactacttcatctgagaaagtt 

50 aatttgttttcaaattga 

Sequence 1504 

MLKCFFDIHQEIDYFKPSLSLFSLLSFSFFCCHSFFVITSSTCMLTSTTSKPVI YFTCSL 
TKSVTLRKILGADSPYSKITFKSTAACFSPTSTDTPVVTLLPFEKALVKLSVKPPFMSKI 
55 PLTSRAAIPAI FSTTSSEKVNLFSN* 

Sequence 1505 

ContigJD612_pos_17095_16784 , 

is similar to (with p-value 2.0e-36) 
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>pir:pir IJC2527 I JC2527 alkaline shock protein - Staphylococc 
us aureus >gp: gp 1 S76213 | S76213_l asp23=alkaline shock protei 
n 23 {methicillin resistant }. [Staphylococcus aureus, 912, Ge 
nomic, 1360 nt] . NID: g894288. 
5 atgaaaggtggcttcacagatagctttactaacgctttctcaaacggtaacaatgtaact 
acaggcgtatctgtagaagttggagaaaaacaagctgctgtagatttaaaagttatttta 
gaatatggtgaatctgcacctaaaattttccgtaaagtgacagacttagttaaagaacaa 
gtaaaatatattactggtttagaagttgttgaagttaacatgcaagttgacgacgtaatg 
acaaaaaaagaatggcaacaaaaaaatgaaaaagataacaaagaaaacaatgaaagagaa 
10 ggtttaaaataa 

Sequence 1506 

MKGGFTDSFTNAFSNGNNVTTGVSVEVGEKQAAVDLKVILEYGESAPKI FRKVTDLVKEQ 
VKYITGLEVVEVNMQVDDVMTKKEWQQKNEKDNKENNEREGLK+ 

Sequence 1507 

Contig_0612_pos_15967_15164, 
is similar to (with p-value 1.0e-73) 

>gp:gp|Z79580|BS168NPRB_5 B. subtilis nprB gene. NID: gl62092 
1. >gp:gp| Z99109 | BSUB0006_190 Bacillus subtilis complete gen 
ome (section 6 of 21): from 999501 to 1209940. NID: g2633260 
. >gp:gp| Y09476I BSY09476_54 B. subtilis 54kb genomic DNA frag 
ment. NID: g214 5361. 

atgcaaccttatttaatttgtctagatctagatggtacattattaaatgacaataaagaa 
atctcaccttacactaaacaagtattaaccgaattacaacaatgtggacactacgttatg 
attgctactggaagaccctatcgcgcaagccagatgtattatcatgaactaaatatgagc 
acacctgttgttaactttaatggagcatttgtacatcatccaaaagcaaacgattttaaa 
gtgatacatgaagtacttgatgtggaaatttctaaaaatattattacagcacttcaacaa 
tctcatattacaaatatcat tgctgaagtaaaagactatgtctttataaatagttatgat 
tcaagactttacgaaggtttt tcaatgggaaatcctaaaattcaaacaggtaatttactt 
gaaaatcttaatgaagcacctacgtcattacttgttgaagcagaagaagaaaatattcct 
gaaattaaagatatgttaacacatttttatgcagaaaatattgaacatcgtcgttggggc 
gcaccgtttccagtaatagaaattgtgaagcgtgggattaacaaagcacgtggaatcaag 
catgttcaaaactatttaaacatcgccgacgatcatatcattgcgtttggtgatgaggac 
aatgatatagaaatgataaagtttgcgacccatggcattgcaatggccaatggcttgaaa 
gatttaaaggaaatagcaaatgagactacgtatagtaataatgaagacggaataggtcgt 
tatttaaatgacttttggtat taa 

Sequence 1508 

40 MQPYLICLDLDGTLLNDNKEISPYTKQVLTELQQCGHYVMIATGRPYRASQMYYHF.LNMS 
TPVVNFNGAFVHHPKANDFKVIHEVLDVEISKNIITALQQSHITNIIAEVKDYVFINSYD 
SRLYEGFSMGNPKIQTGNLLENLNEAPTSLLVEAEEENIPEIKDMLTHFYAENIEHRRWG 
APFPVIEIVKRGINKARGIKHVQNYLNIADDHI IAFGDEDNDIEMIKFATHGIAMANGLK 
DLKEIANETTYSNNEDGIGRYLNDFWY* 

45 

Sequence 1509 

Contig_0612_pos_8705_8037, 

is similar to (with p-value 9.0e-86) 

>gp:gp|AF076683|AF076683_5 Staphylococcus aureus oligopeptid 
50 e transporter putative substrate binding domain (opp-lA) , ol 
igopeptide transporter putative membrane permease domain (op 
p-lB) , oligopeptide transporter putative membrane permease d 
omain (opp-lC) , oligopeptide transporter putative ATPase dom 
ain (opp-lD) , and oligopeptide transporter putative ATPase d 
55 omain (opp-lF) genes, complete cds; and unknown gene. NID: g 
3800817. 

gtgtcatttgattgccccactggtgcatcaatagcaattattggagaaagtggaagtgga 
aagtctacgttgagtcgtatgattttaggactagaaaaaccagatcaaggacaagtgacg 
ttggacggtcagcctgttcatttaaaaaaagtgagacgtcatcgaattgctgcggttttt 



20 



25 



30 
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caagactatacttcttctttgcatccttttcatactgtcaaagatattttgtttgeagta 
atgaatcagtgtcgatgtacatcaaaggagaatatggaagagtatgtaacagcgttacta 
cgtgaagtagggttaaaatctgattgtttgtattgttatccacatatgctttcaggtgga 
gaagcacagcgtgtagctattgcacgtgcgatatgtatgcaaccagattatatattattt 
5 gatgaggcgattagttcattagatatgtctatgcaaacacaaatattagatttattgaaa 
aggttacgtcactcacatcagctgagttatatttttattactcatgatatacaagctgcc 
acgtatatatgtgatgacttgcttatttttaaaaatggctgtatcgaagctaggacatct 
ataagtgaattgcataggcaacaaaatggttatacaagagaactgattgataaacaacta 
tcaatctaa 

10 

Sequence 1510 

VSFDCPTGASIAIIGESGSGKSTLSRMILGLEKPDQGQVTLDGQPVHLKKVRRHRIAAVF 
QDYTSSLHPFHTVKDILFEVMNQCRCTSKENMEEYVTALLREVGLKSDCLYCYPHMLSGG 
EAQRVAIARAICMQPDYILFDEAISSLDMSMQTQILDLLKRLRHSHQLSYIFITHDIQAA. 
15 TYICDDLLI FKNGCIEARTSISELHRQQNGYTRELIDKQLSI * 

Sequence 1511 

Cont i g_0 612_pos_7 97 4_68 35 , 

is similar to (with p-value 0.0e+00) 

20 >gp:gp|AF076683|AF076683_6 Staphylococcus aureus oligopeptid 
e transporter putative substrate binding domain (opp-lA) , ol 
igopeptide transporter putative membrane permease domain {op 
p-lB), oligopeptide transporter putative membrane permease d 
omain (opp-lC) , oligopeptide transporter putative ATPase dom 

25 ain (opp-lD) , and oligopeptide transporter putative ATPase d 
omain (opp-lF) genes, complete cds; and unknown gene. NID: g 
3800817. 

atgttttttagtgcgaatgccatactcaatgttttcatacctctaagaggacatgacttg 
gaggcgacgaataccgtaattggaattgtaatgggagcttacatgctaacggcaatgcta 

30 t ttcgcccttgggctggtcaaattattgcacgtgtaggaccgattaaagtattgcgtatt 
atattattgattaatgctatggcactggtattatatgggtttacaggacttgaaggttat 
ttggttgcacgtatcatgcaaggtgtatgtacggcattcttctcaatgtctttacaattg 
ggtattatagatgctttacctgaaaaatatcgttcagaaggtgtatctctctattcattg 
ttttcaacaattcccaatttattaggaccattaattgcagttgggatttggcacgtggaa 

35 aatatgaccatatttgctattgttatgatttttattgcagtaacaacaaccttatttggt 
tatagaactacttttgcaaatacacaaaaagaggtagcaccaaaagaagaagtcttacct 
tttaatgcaatgactgtatatgttcaattttttaaaaataaagcactcttctgcagtggt 
atgattatgattttgtcatctatcgtgtttggtgcgatgagtacttttataccattatat 
acagttagggaaggtttcgcgaatgcaggtattttccttacaattcaagccattacagta 

40 gtgatagctagattttatttacgtaagtatgtaccat ctgatggtttatggcatcaccgg 
tttatgatgattgtcttaacgttactgatgattgcttcaatcattgtagcttttggacca 
caaatattgagtatatttgtatatataagtgcaatctttattggaataacacaagcgctc 
gtttatcctacattgacaacgtatttaagttttgtcttaccaaagataggacgtaatatg 
ttattaggattgtttatagcatgtgcagatttagggatttcactaggaggtgtgctaatg 

45 gggccaatatcagatacggtaggatttaaatggatgtatattttatgcgctttattggtt 
actattgcaatgatactaagtaaaattagacaaggacaaagtgtttctaaagcttcatag 



Sequence 1512 

50 MFFSANAI LNVFI PLRGH DLEATNTVI GI VMG AYMLTAML FRPWAGQI I ARVGPI KVLR I 
ILLINAMALVLYGFTGLEGYLVARIMQGVCTAFFSMSLQLGI IDALPEKYRSEGVSLYSL 
FSTIPNLLGPLIAVGIWHVENMTI FAIVMIFI AVTTTLFGYRTTFANTQKEVAPKEEVLP 
FNAMT VYVQFFKNKALFCSGMI MI LS S I VFG AMST FI PLYT VREG FANAG I FLT I QAI TV 
VIARFYLRKYVPSDGLWHHRFMMIVLTLLMIASIIVAFGPQILSIFVYISAIFIGITQAL 

55 VYPTLTTYLSFVLPKIGRNMLLGLFIACADLGISLGGVLMGPISDTVGFKWMYILCALLV 
TIAMILSKIRQGQSVSKAS* 

Sequence 1513 
Contig_0612_pos_3672_3067, 
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is similar to (with p-value 2.0e-47) 

>gp:gp|AF051356|AF051356_5 Streptococcus mutans YtqB (ytqB) 
gene, partial cds; ABC transporter (abcX), putative permease 
(perM) , putative hemolysin (hlyX), pyruvate-formate lyase a 
5 ctivating enzyme (pf 1C) , D-alanine-D-alanyl carrier protein 
ligase (dltA) , integral membrane protein (dltB) , D-alanyl ca 
rrier protein (dltC) , extramembranal protein (dltD) , and put 
ative exopolyphosphatase (ppxl) genes, complete cds; and unk 
nown gene. NID: g2952523. >gp : gp I AB0184 17 | AB0184 17_2 Strepto 
10 coccus mutans genes for PFL-activating enzyme and PFLAE-5'OR 
F, partial and complete cds. NID: g3986292. 

gtgacggttgatgaaatggtaaatgaaatcttaccgtacaaaccttactttgaagcttca 
ggtggtggggtaacagtcagtggtggcgaaccattactacaaatgcctttcttggagcaa 
ttattcaaagaattaaaagcgaatggtgttcacacatgcattgatacttctgcgggatgt 

15 gtgaatgatacaccagcatttaatcgtcattttgatgaattgcaaaagcatacagattta 
atcttattagatattaaacatattgataatgataagcacatcaaattaacaggcaaacct 
aacacacatattttaaagtttgcacgtaaattatctgatatgaaacaacctgtttggatt 
agacatgttttagtacctggtatttcggatgataaagaagatttgataaaactaggagaa 
tttattaattctttagataacgttgaaaagtttgaaatcttaccatatcatcaactcggt 

20 gtgcataagtggaaaaatttaggcatcccttatcaactcgaaaatgttgaaccatctgac 
gatgaagcggttaaagaagcttatcgctatgttaactttaatggcaaaatacccgtaaca 
ttatag 

Sequence 1514 

25 VTVDEMVNEILPYKPYFEASGGGVTVSGGEPLLQMPFLEQLFKELKANGVHTCIDTSAGC 
VNDTPAFNRHFDELQKHTDLILLDIKHIDNDKHIKLTGKPNTHILKFARKLSDMKQPVWI 
RHVLVPGISDDKEDLIKLGEFINSLDNVEKFEILPYHQLGVHKWKNLGIPYQLENVEPSD 
DEAVKEAYRYVNFNGKI PVTL* 

30 Sequence 1515 

Contig_0612_pos_1364_3, 

is similar to (with p-value 3.0e-98) 

>Sp : Sp I P54 104 | BRNQ_LACDL BRANCHED CHAIN AMINO ACID TRANSPORT 
SYSTEM CARRIER PROTEIN . >pir : pir I S60180 | S601 80 branched-cha 
35 in amino acid carrier brnQ - Lactobacillus delbrueckii >gp:g 
pi Z4 867 6|LDBRNQGN_1 L . delbrueckii brnQ gene for branched-cha 
in amino acid carrier. NID: g732812. 

atgatgaaaaataaattaacattaaaagagaatctatttatcggctcaatgctgtttggt 
cttttttttggtgctggaaatctcatttttccaattcacttaggtcaaactgcgggggca 

40 aatgtatggaccgccaatttaggatt tcttatcacggctatcggactaccttttttagga 
attatagcgataggtgtatctaaaacaaacggggtctttgaaatttcctcaaggataagt 
aaaatatatggttatttgttcacaattggcttgta'tcttgttataggtccgttttttgcg 
ttgccaagacttgcgacgacgtcatttgaaatagcattttcaccatttatttcatctggt 
acggcccaagcgttgttgcctatttttagtattttattcttcggagtagcgtggttattt 

45 tcgcgtaaaccttctaaaatattagactatattggaaaattcttaaatccggtctttctc 
atcttgcttggaattgttgttgtgcttgcatttatccgtcctatgggtggaattagtcat 
gcgccagtaagtgctgattatagcaatagcgtgttactcaaagggtttatcgatggatat 
aatacattagacgctttggcatcattagcatttggtattatcattgttactacaattaaa 
aagttggggattactaatccgaatacaatcgctaaagaaactttaaaatcaggtacgatt 

50 agtattatagctatgggcgttatttatactttattagctttaatgggtacgatgagttta 
ggtcgttttaaagtaagtgaaaatggtggtattgcgcttgctcagattgcacaacattat 
ttaggggattacggaattattattttgtcactaatcatcattgtggcatgtctgaaaaca 
gcaataggattgatcacagccttttcggaaacatttacagagttattccctaaatctaac 
tatctttggttagctactggggtgagtatattagcttgtatatttgctaatgtaggttta 

55 acaaaaattattatgtattcaacaccagtgttgatgttcatttatcctttagcgattact 
ttaattttattagcat tact tagtccattatttaaacattctaaaat tgt eta tcgattt 
acaacattatttacaatggtggcggcatttgtagatggtgtgaaagcaagtccaga.gttc 
tttgttaatacaaaatttgcacaaacaatcattggatttggtgaaaattatctcccattc 
tttaacattggtatgggatggattgttccagcacttattggtttcattattggtattatt 
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gtatactttatgactgctaaaaaatcgtcccacgtacaataa 
Sequence 1516 

MMKNKLTLKENLFIGSMLFGLFFGAGNLIFPIHLGQTAGANVWTANLGFLITAIGLPFLG 
5 I IAIGVSKTNGVFEISSRISKI YGYLFTIGLYLVIGPFFALPRLATTSFEIAFSPFISSG 
TAQALLPIFSILFFGVAWLFSRKPSKILDYIGKFLNPVFLILLGIVVVLAFIRPMGGISH 
APVSADYSNSVLLKGFIDGYNTLDALASLAFGII IVTTIKKLGITNPNTIAKETLKSGTI 
SIIAMGVIYTLLALMGTMSLGRFKVSENGGIALAQIAQHYLGDYGI IILSLIIIVACLKT 
AIGLITAFSETFTELFPKSNYLWLATGVSILACIFANVGLTKIIMYSTPVLMFIYPLAIT 
10 LILLALLSPLFKHSKIVYRFTTLFTMVAAFVDGVKASPEFFVNTKFAQTIIGFGENYLPF 
FNIGMGWIVPALIGFIIGIIVYFMTAKKSSHVQ* 

Sequence 1517 

Contig_061 3_pos_38 35_4 1 97 , 

15 putative peptide of unknown function 

atgaatttgagtaatttcaaagttccaaaagttagattaggaaatagaacttatagtcaa 
agcgagctacaagactataggaaagccaatacacaaaggtataaccaagaggttagacac 
aataggcacaatagagagtatacagcgttctacaacagtacacagtggcgtaagttgcgt 
aaacaagtattattacgtgataactacttgtgtcaacattgtttaagtaaaggaatagtg 

20 aatgacaaagatttgattgttcaccataagattgaattaaaacgggactggtcgaaaaga 
ctggatatggataatttagaggcagtgtgttttagctgccataataaaattcacggtgga 
taa 

Sequence 1518 

25 MNLSNFKVPKVRLGNRTYSQSELQDYRKANTQRYNQEVRHNRHNREYTAFYNSTQWRKLR 
KQVLLRDNYLCQHCLSKGIVNDKDLIVHHKIELKRDWSKRLDMDNLEAVCFSCHNKIHGG 

* 

Sequence 1519 
30 ContigJD613_pos_8339_7815, 

is similar to (with p-value 1.0e-49) 

>sp:sp|P08064 |DHSC_BACSU SUCCINATE DEHYDROGENASE CYTOCHROME 
B-558 SUBUNIT. >pir : pir I A2984 3 | DEBSSC succinate dehydrogenas 
e {EC 1.3.99.1) cytochrome b558 - Bacillus subtilis >gp:gp|M 

35 13470 j BACSDHAB_1 B. subtilis succinate dehydrogenase complex 
encoding cytochrome b-558 subunit, complete cds, and flavopr 
otein subunit, 5' end. NID: gl43524. >gp : gp | Z99118 I BSUB0015_ 
110 Bacillus subtilis complete genome (section 15 of 21) : fr 
om 2795131 to 3013540. NID: g2635200. >gp: gp I Z75208 I BSZ75208 

40 _57 B. subtilis genomic sequence 890O9bp. NID: gl769994 . 

atggtaaaccatcaagcaacgcaaggtgctgaagcttttaatagagcttcaggatttatg 
gaatctttaccattccttattgtgatggaatttatacttatttatataccattgttatac 
catggtttgttcggtttacacatcgcattcactgctaaggagaacatcgggcattactca 
ttatttagaaactggatgtttttcttccaacgtgtaagtggtattttagcatttgttttt 

45 attgcaatgcacttatggcaaacacgtttgcaaaaagctttttatggtaaatctgtggac 
tataatctaatgcatgaaacattacaacatccgttatgggcaatcttttacattatttgt 
gtcattgctgttgttttccattttgctaatggtttatggtcattttgtgtaacatggggc 
tttttacaatctaaaaaatcacaacgtgtttttacttggatttcactcatagtattttta 
gtgatttcttatattggtgttgcagccgttattgcgtttatataa 

50 

Sequence 1520 

MVNHQATQGAEAFNRASGFMESLPFLIVMEFILI YIPLLYHGLFGLHIAFTAKENIGHYS 
LFRNWMFFFQRVSGILAFVFIAMHLWQTRLQKAFYGKSVDYNLMHETLQHPLWAIFYIIC 
VIAVVFHFANGLWSFCVTWGFLQSKKSQRVFTWISLIVFLVISYIGVAAVIAFI* 

55 

Sequence 1521 

Contig_0613_pos_7 614_64 03, 

is similar to (with p-value 0.0e+00) 

>pir:pir | A27763 IA27763 succinate dehydrogenase (EC 1.3.99.1) 
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flavoprotein - Bacillus subtilis 
atgtcaacaattaaagcggcagaacaaggtgcacatgtagatttattttccattgtaccg 
gtaaagcgttcgcactctgtttgtgcacaaggtggcataaatggtgctgttaatactaaa 
ggtgagggagattcaccgtggattcactttgatgatactgtttatggtggagacttcctg 
5 get aatcaaccaccagtcaaagcaatggctgatgctgcacctaaaa teat ccatctgtta 
gatcgtatgggggttatgtttaacagaacgaaagaaggcttattagactttagacgtttt 
ggcggtacactacatcatagaacagcttttgctggcgcaacgacaggtcaacaattgctt 
tatgcattagatgagcaagttcgttcatttgaggtagatggtttagtaactaaatacgaa 
ggatgggaatttctaggtattgttaaagacgaagaagatgctgcaagaggtattgttgct 

10 caaaatatgacaacatcagaaattcaatcattcggttcagatgctgtcatcatggcaaca 
ggtggtcctggtattatctttggtaaaacgacgaattcaatgattaatacaggttcagcg 
gcgtcaatcgtttatcagcaaggtgcgatttatgcaaatggtgaattcatccaaatacat 
ccgactgcgattcctggagatgacaaattacgtcttatgagtgaatcagctcgtggtgaa 
ggtggacgtatttggacgtataaagatggtaaaccttggtacttcttagaagaaaaatat 

15 ccagactatggtaacttggttccacgtgatatagcgacacgtgaaattttcgatgtttgt 
attaaccaaaagttaggtatcaatggagaaaacatggtataccttgatttatctcataaa 
gatccacacgaattggatgttaaattaggtggtattattgaaatttatgaaaaattcaca 
ggtgatgatccacgtaaagttccaatgaaaatcttcccagcagtgcattattcaatgggt 
ggtttatacgtagactatgatcaaatgactaatatcaaagggttat ttgcagctggagaa 

20 tgtgatttctcacaacatggtggtaaccgtttaggtgccaattctttactttcagctatt 
tataacataagcaaatatggtagccatgagtggatctattttttctcggttcatcttctt 
ttcaatcattaa 

Sequence 1522 

25 MSTIKAAEQGAHVDLFSIVPVKRSHSVCAQGGINGAVNTKGEGDSPWIHFDDTVYGGDFL 
ANQPPVKAMADAAPKIIHLLDRMGVMFN RTKEGLLDFRRFGGTLHHRTAFAGATTGQQLL 
YALDEQVRSFEVDGLVTKYEGWEFLGIVKDEEDAARGIVAQNMTTSEIQSFGSDAVIMAT 
GGPGIIFGKTTNSMINTGSAASIVYQQGAIYANGEFIQIHPTAIPGDDKLRLMSESARGE 
GGRIWTYKDGKPWYFLEEKYPDYGNLVPRDIATREIFDVCINQKLGINGENMVYLDLSHK 

30 DPHELDVKLGGIIEI YEKFTGDDPRKVPMKIFPAVHYSMGGLYVDYDQMTNIKGLFAAGE 
CDFSQHGGNRLGANSLLSAIYNISKYGSHEWI YFFSVHLLFNH* 

Sequence 1523 
Contig_0614_pos_687 6_6298, 
35 is similar to (with p-value 7.0e-19) 

>gp:gp| U4O604 |LMU40604_2 Listeria monocytogenes ClpC ATPase 
(mec) gene, complete cds . NID: gl314293. 

gtgaggtgtttaaaattgctttgtgaaaattgccattttaatgaagcggaagttaaactt 
actgttaaaggtatagatagtacgcatgaaaaatgggtatgttcagtatgtgcccaagga 

40 gaaaacccctggttacattctaacgatgataatacgtatcatacacaccaagacgatata 
gaagaagcatttgtagtgaaacagatacttcaacaccttgctgcaaaacatggtattaat 
tttcatgagatggcatttaaagaagaaaaaaaatgcccaacgtgtcagatgacacttaag 
gatattgcacatgttggtaagcttgggtgtgctgattgttatgctacgtttaaagaagac 
atcattgatatagttcaacgtgttcaaggtggtcaatttgaacatgtaggaaaaacacca 

45 caatcatcgtataagaaacttgcaataaaaaagcaaattgaagaaaaatcaaaatatcta 
aataaattgatagatggtcaagagtttgaagaggcagcgattgttcgtgatgaaattaaa 
gctttaaaaagtgagagcgaggtgtctcatgatgagtaa 

Sequence 1524 

50 VRCLKLLCENCHFNEAEVKLTVKGIDSTHEKWVCSVCAQGENPWLHSNDDNTYI1THQDDI 
EEAFVVKQILQHLAAKHGINFHEMAFKEEKKCPTCQMTLKDIAHVGKLGCADCYATFKED 
IIDIVQRVQGGQFEHVGKTPQSSYKKLAIKKQIEEKSKYLNKLIDGQEFEEAAIVRDEIK 
ALKSESEVSHDE* 

55 Sequence 1525 

Contig_0614_pos_62 66_5301, 

is similar to {with p-value 3.0e-70) 

>sp:sp|P37570| YACI_BACSU HYPOTHETICAL 41.1 KD PROTEIN IN LYS 
S-MECB INTERGENIC REGION (ORFX) . >gp : gp | D26185 ! BAC1 80K_1 4 7 B 
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. subtilis DNA, 180 kilobase region of replication origin. N 
ID: g467326. >gp : gp I Z99104 | BSUB0001_85 Bacillus subtilis com 
plete genome (section 1 of 21): from 1 to 213080. NID: g2632 
267. 

5 atgtctgaggagacacctgttattatttcttccagaattcgattagctagaaatcttgaa 
aaccatgtccacccacttatgttcccttcagagcaagaaggatatcgagtgataaatgaa 
gttcaagatgcgctttccaacttaactttaaatcgattagatacgatggatcaacaaagt 
aaaatgaaattggttgcgaaacatcttgtgagtcctgaactagtgaaacaacctgcttca 
gcagtaatgttaaatgatgatgaatcggtaagtgttatgataaacgaagaagatcatata 

10 cgaatacaggctctaggaactgatttatcgctaaaggatttatatcaacgcgcttctaaa 
attgatgatgaattagataaagcgttagacattagttatgatgagcatttaggatattta 
actacctgtcctactaatattggtacaggaatgcgtgcaagtgtgatgttacatttacct 
ggactctccattatgaaaagaatgaacagaattgcacaaacaattaatcgttttggattt 
acaattcgaggtatatacggagaagggtcacaagtatatggtcacatttatcaggtttca 

15 aaccaacttacactagggaaaacagaagaagacattatcgataacttaactgaagttgta 
aatcaaattataaat'gaagaaaagcaaataagagaaagacttgataaacacaattctgta 
gagacactggatagagtttatcgatcattaggtgtactacaaaacagtagaattatttct 
atggaagaagcctcatatcgtttgagcgaagtgaaactaggtattgatttgaattatatt 
ttgcttgaaaattttaaatttaatgaattaatggtagcaatacagtcaccatttttaata 

20 gatgacgatgataatagaacagtaaatgaaaaaagagctgatttattaagagaacatata 
aaatag 

Sequence 1526 

MSEETPVIISSRIRLARNLENHVHPLMFPSEQEGYRVINEVQDALSNLTLNRLDTMDQQS 
25 KMKLVAKHLVSPELVKQPASAVMLNDDESVSVMINEEDHI RIQALGTDLSLKDLYQRASK 
IDDELDKALDISYDEHLGYLTTCPTNIGTGMRASVMLHLPGLSIMKRMNRIAQTINRFGF 
T I RGI YGEGSQV YGH I YQVSNQLTLGKTEEDI I DNLTEVVNQI INEEKQI RERLDKHNS V 
ETLDRVYRSLGVLQNSRI ISMEEASYRLSEVKLGIDLNYILLENFKFNELMVAIQSPFLI 
DDDDNRTVNEKRADLLREHIK* 

30 

Sequence 1527 

Contig_0614_pos_5287_2834, 

is similar to (with p-value 0.0e+00) 

>sp:sp| P37571 |MECB_BACSU NEGATIVE REGULATOR OF GENETIC COMPE 
35 TENCE MECB. >gp : gp 1 D26185 | BAC180K_1 4 8 B. subtilis DNA, 3 80 k 
ilobase region of replication origin. NID: g467326. >gp:gpJU 
02604 | BSU02604_2 Bacillus subtilis Marburg 168 ClpC adenosin 
e triphosphatase (mecB) gene, complete cds, orfX and orfY, p 
artial cds. NID: g442358. >gp : gp I Z99104 I BSUB0001_86 Bacillus 
40 subtilis complete genome (section 1 of 21): from 1 to 21308 
0. NID: g2632267. 

atgttatttggtagattgacagagcgtgcacaacgtgtgttggcacatgcacaagaggaa 
gcaattcgtttgaaccattctaatattggaacagaacatcttttgcttggtttaatgaaa 
gagccagaaggtatagcagcaaaggtattagtaagttttaatattactgaagataaagtc 

45 atcgaagaagttgaaaaacttatcggtcacggtcaagagcaaatgggcacactacattat 
acaccgagagcaaaaaaagtaattgaactgtctatggatgaagctcgaaagctacatcat 
aactttgtaggaacagagcatatactattaggtttaattagagaaaatgaaggtgttgca 
gcacgtgtatttgcaaacctagatttaaatattactaaagcacgtgcccaagttgtaaaa 
gctttaggaagtccagaaatgagtaataaaaatgcgcaagctaataagtctaataacacg 

50 cctactttagatggattagctagagatttaactgttattgctaaagatggaacgttagat 
ccagtcgtaggacgagataaagaaattactcgtgtaattgaagttttaagtcgtcgtact 
aaaaataatcctgtgctaattggtgaacccggtgttggtaaaacagcaattgctgaaggg 
cttgcgcaagcaattgttaaaaatgaagtaccagaaactttaaaagacaaacgtgtaatg 
tcattagatatgggtacagtcgtagctggcactaaatatcgtggtgaatttgaagaaaga 

55 ttgaaaaaagttatggaggaaatccatcaagctggtaatgttattctatttatcgatgaa 
cttcatactttagttggcgctggtggcgcagaaggagcaattgatgcatctaatatttta 
aaacctgctttagctcgtggagaattgcaatgtataggtgccacaacattagatgaatat 
cgtaaaaatatagaaaaagacgctgcattagaacgtcgttttcaaccaattcaagtggat 
gaacctacagttgaagacacgattgaaatcttaaaaggattacgtgaccgttatgaggct 
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catcacagaattaatatctcagatgaagctttagaagcggctgctaaattgagtgatcgc 
tatgtttcagatcgtttcttgccagataaagccattgacttaattgatgaggcaagttca 
aaagttagacttaaaagtcatacaacgccaagtaatttaaaagagattgaacaagaaatt 
gataaagtaaaaaatgaaaaagatgctgcagttcatgctcaagaatttgaaaatgccgct 
5 aatttaagagataagcaatctaaacttgaaaagcaatatgaagatgctaaaaatgaatgg 
aaaaatgcacaaggtggtttagatactgccttatctgaagaaaatatcgctgaagtaata 
gctggttggacaggtattcctttaactaaaattaatgaaactgaatcagatcgtttattg 
aatcttgaagatacact teat aaacgtgt cat tggacaaaacgatgctgtcaattcaatt 
agtaaagctgttagaagagctcgtgctggtcttaaagatccaaaacgtccaatcggtagt 

10 tttattttcttaggacctacaggtgtgggtaaaactgaattggctcgtgctttagctgaa 
tctatgtt tggtgaagacgatgcaatgattcgcgtagatatgagtgaatttatggagaaa 
catgctgtcagtcgattagttggtgcacctccaggatatgtaggacatgatgacggcggt 
caattgactgaaaaagttagacgtaaaccatactctgtgattttatttgatgaaattgag 
aaagcacatcctgacgtatttaatattcttctacaagttttagatgatggtcatttaaca 

15 gatactaaaggtcgtactgtggacttccgtaatactgtgattattatgacttctaatgtg 
ggagctcaagaattacaggaccaacgctttgctggttttggaggtgcttcagaaggtagt 
gac t acga a act g t ca ga a a aacaatgatgaaagaattaaa a aatt cat tccgaccagaa 
ttcttaaaccgtgttgatgacattattgtcttccacaaacttacaaaagatgaattaaaa 
gaaattgttacaatgatggtaaataaacttactcaccgtctttcagagcaaaatattaat 

20 attgttgttactgataaagcgaaagaaaaaattgcagaagaaggatatgatcctgaatat 
ggtgctagaccactcattagagcaattcaaaaaacggttgaagataatttaagcgaattg 
attttagatggaaataaaattgaaggtaaagaagtaacaattgatcatgatggtaaagaa 
tttaagtatgatatttatgaaattacagctaaaaaagaaacaacagaatcataa 

25 Sequence 1528 

MLFGRLTERAQRVLAHAQEEAIRLNHSNIGTEHLLLGLMKEPEGIAAKVLVSFNITEDKV 
IEEVEKLIGHGQEQMGTLHYTPRAKKVIELSMDEARKLHHNFVGTEHILLGLIRENEGVA 
ARVFANLDLNI TKARAQVVKALGS PEMS NKNAQANKSNNTPTLDGLARDLTVI AKDGTLD 
PVVGRDKEITRVIEVLSRRTKNNPVLIGEPGVGKTAIAEGLAQAIVKNEVPETLKDKRVM 

30 SLDMGTWAGTKYRGEFEERLKKVMEEIHQAGNVILFIDELHTLVGAGGAEGAIDASNIL 
KPALARGELQCIGATTLDEYRKNIEKDAALERRFQPIQVDEPTVEDTIEILKGLRDRYEA 
HHRINISDEALEAAAKLSDRYVSDRFLPDKAIDLIDEASSKVRLKSHTTPSNLKEIEQEI 
DKVKNEKDAAVHAQEFENAANLRDKQSKLEKQYEDAKNEWKNAQGGLDTALSEENIAEVI 
AGWTGIPLTKINETESDRLLNLEDTLHKRVIGQNDAVNSISKAVRRARAGLKDPKRPIGS 

35 FIFLGPTGVGKTELARALAESMFGEDD7VMIRVDMSEFMEKHAVSRLVGAPPGYVGHDDGG 
QLTEKVRRKPYSVILFDEIEKAHPDVFNILLQVLDDGHLTDTKGRTVDFRNTVIIMTSNV 
GAQELQDQRFAG FGGASEGSDYETVRKTMMKELKNSFRPEFLNRVDDIIVFHKLTKDELK 
EIVTMMVNKLTHRLSEQNINIVVTDKAKEKIAEEGYDPEYGARPLIRAIQKTVEDNLSEL 
ILDGNKIEGKEVTIDHDGKEFKYDI YEI TAKKETTES* 

40 

Sequence 1529 

Contig_0614_pos_1884_970, 

is similar to (with p-value 0.0e+00) 

>sp:sp|P37572lRADA_BACSU DNA REPAIR PROTEIN RADA HOMOLOG (DN 
45 A REPAIR PROTEIN SMS HOMOLOG). >gp: gp I D26185 I BAC180K_14 9 B. 
subtilis DNA, 180 kilobase region of replication origin. NID 
: g467326. >gp : gp I Z99104 I BSUB0001_87 Bacillus subtilis compl 
ete genome (section 1 of 21): from 1 to 213080. NID: g263226 
7 . 

50 gtgatacatcaaactgtaaaagaagagagacctgacttacttgttgt tgattcgat tcaa 
acaatctatcatccggaaattagttccgcacctggatcggtatcacaagtaagagagagt 
acgcagagtttaatgaacattgctaaacaaatgaatattgccacatttattgtgggacac 
gtaacaaaagaaggacaaatcgccggaccaagattattggaacatatggttgatacagtt 
ctttattttgaaggagatgagcatcacgcatatcgtatccttagagcagtaaaaaataga 

55 tttggttctacaaatgagatggggattttcgaaatgaagcaaagtggattaaaaggtgta 
cttaatccttctgaaatgtttttagaagaacgttctacaaatgttccgggctctacaatc 
gtccccactatggaaggaacaagaccactactcattgaagtccaagcgcttgttacacca 
acaacatttaataatcctagacgaatggctacaggtatagatcataatcgattaagttta 
cttatggcggttctagaaaaaaaggaaaactatttactccaacaacaagatgcctatatt 
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aaagtagcaggtggcgtcaaattaacagaacctgctgttgatttaagcattattgttgcg 
acagcttcaagttttaaagatcaagctgttgatggattagattgttttgtgggtgaagtt 
ggattaacaggtgaagtacgcagagtatctcgcatagagcaacgtgttcaagaagcgacc 
aaactagggtttaaaagagctattattccacagacaaatattggaggttggacattccca 
5 gaaggcatccaagtcgttggtgtttcatcagtacatgaagctttgaaatatgcattacat 
tcaaaacagcgataa 

Sequence 1530 

VIHQTVKEERPDLLVVDSIQTIYHPEISSAPGSVSQVRESTQSLMNIAKQMNIATFIVGH 
10 VTKEGQIAGPRLLEHMVDTVLYFEGDEHHAYRILRAVKNRFGSTNEMGIFEMKQSGLKGV 
LNPSEMFLEERSTNVPGSTI VPTMEGTRPLLIEVQALVTPTTFNNPRRMATGI DHNRLSL 
LMAVLEKKEN YLLQQQDAY I KVAGG VKLTEPAVDLS 1 1 VATASS FKDQAVDGLDC FVGEV 
GLTGEVRRVSRIEQRVQEATKLGFKRAI I PQTNIGGWTFPEGIQVVGVSSVHEALKYALH 
SKQR* 

15 

Sequence 1531 

ContigJD614_pos_0_934 , 

is similar to (with p-value 1.0e-46) 

>gp:gp|U40604 |LMU406O4_6 Listeria monocytogenes ClpC ATPase 

20 (mec) gene, complete cds . NID: gl314293. 

atggaaggaggctataaattgaatat aacaaaagcaattgttgtagcaatctatatcatt 
gttggtgcagcacttggtgttataattatacccgaagttgttacagatcttggcattcat 
caccatgcggttatcactaaLtattatgtagatggtttcatagggatcattatatttttt 
ataatatttggattgttcattaataaagtaacatatgcttttaaacaatttgaacaatta 

25 atcatgagacgtagtgcggtagaaatattatttgctacaattggtttaattattggttta 
tttatttcagtgatggtttcttttatcttagaaatgataggtaattccatattaaatcac 
tttgtacctatgataatcactattattttatgttatttagggtttcaatttggtctgaaa 
aaaagagatgaaatgcttatgtttttaccagagaatatggcacgttccatgtctaataat 
atacgaagagcgacacctaagattgtagatacaagtgccattatcgatggaaggatatta 

30 gatattatacgttgcggatttatcgatggtgatatattgataccacaaggcgttataaat 
gaattacaggttatagcggatgctaaagatagcgtgaaacgtgaaaaaggtcaaagagga 
ttagatattttgaatcaactttatgatttagattatcctacacgcgttatacatccaact 
caatcccatagtgatatagatacattattaattaaattagcacaacagtatcatgcacat 
gtgattacgactgattttaatttaaataaagtatgtcacgttcaaggaattacagcactc 

35 aacgttaatgatttatcggaagcaatcaaacctaatgtacatcaaggcgaccagttaagt 
attttattaacgaagataggtaaagagCTTTTTA 

Sequence 1532 

MEGGYKLNITKAI VVAIYIIVGAALGVIIIPEVVTDLGIHHHAVITNYYVDGFIGI IIFF 
40 IIFGLFINKVTYAFKQFEQLIMRRSAVEILFATIGLIIGLFISVMVSFILEMIGNSILNH 
FVPMIITIILCYLGFQFGLKKRDEMLMFLPENMARSMSNNIRRATPKIVDTSAIIDGRIL 
DIIRCGFIDGDILIPQGVINELQVIADAKDSVKREKGQRGLDILNQLYDLDYPTRVIHPT 
QSHSDIDTLLIKLAQQYHAHVITTDFNLNKVCHVQGITALNVNDLSEAIKPNVHQGDQLS 
ILLTKIGKELFX 

45 

Sequence 1533 

Contig_0615_pos_391_900, 

is similar to (with p-value l.Oe-48) 

>gp:gp|U0O013|UO0013_9 Mycobacterium leprae cosmid B1496. NI 

50 D: g466868. 

atgaaatatccaaactgtgtacttttaggtgaaggtgccaaaggtagcacattatccatt 
gcatttgctggtaaaggtcaagttcaagatgctggtgctaaaatgattcataaagcacct 
aatacatcttcaactattgtttctaaatctatctccaaaaatggtggtaaagtcat ttat 
cgtggtatcgttcatt ttggacgtaaagctaagggagcacgttcaaatatcgaatgtgat 

55 acattaattttagataatgaatcgacttcagatactatcccttataatgaagtgttcaat 
gacaatatttcattagaacatgaagctaaagtttctaaagtatcagaagagcaattattc 
tatcttatgagtcgtggtatttctgaggaagaagcgacagaaatgattgttatgggattc 
attgagccgttcacaaaagaattaccaatggaatacgcagtagaaatgaaccgtttaatt 
aagtttgaaatggaaggctcaattggttaa 
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Sequence 1534 

MK Y PNCVLLGEGAKGSTLS I AFAGKGQVQDAGAKMI HKAPNTSS T I VS KS I SKNGGKVI Y 
RGIVHFGRKAKGARSNIECDTLILDNESTSDTIPYNEVFNDNISLEHEAKVSKVSEEQLF 
5 YLMSRG I SEEEATEMI VMG FI E P FTKELPMEYAVEMNRLI KFEMEGS I G * 

Sequence 1535 

Contig_0615_pos_2732_314 5, 

putative peptide of unknown function 

10 atgatgaataaagcaattaatataacatcattgataggaatcattttacaaagtttttct 
agtctattgtttttagttttcttggtattttcaattactggagcgatggatgctaatttt 
actacaacagttaatggtgaaacaacagttcatagtgcagaagccgcacagcatgttttt 
agtgttggatttttggttatattcatcgttgcaattttttcaatcatttttggaattata 
ggtatgatgaagaaaaaaactaatactatagcaagtggtgtattttatattataggtgca 

15 atattaagtttaaatatgattacttttatatcttggctagtgtgtggaatattattaatt 
aaaaaaagacaaaataaaagcataaaagacaataaaacacatttggtggattag 

Sequence 1536 

MMNKAINITSLIGIILQSFSSLLFLVFLVFSITGAMDANFTTTVNGETTVHSAEAAQHVF 
20 SVGFLVIFIVAI FSIIFGIIGMMKKKTNTIASGVFYIIGAILSLNMITFISWLVCGILLI 
KKRQNKS I KDNKTH LVD* 

Sequence 1537 

Contig_0615_pos_3907_4 716, 

25 is similar to (with p-value 1.0e-22) 

>gp:gp|U93876|BSU93876_19 Bacillus subtilis aminoglycoside 6 
-adenylyltransf erase (aadK). gene, partial cds, and YrdA (yrd 
A), YrdB (yrdB), hypothetical protein YrdC (yrdC) , YrdD (yrd 
D) , hypothetical cytochrome P450 protein YrdE (yrdE) , ribonu 

30 clease inhibitor (yrdF) , regulatory protein YrdG (yrdG) , hyp 
othetical protein YrdH (yrdH), hypothetical protein YrdI (yr 
dl), amino acid transporter (yrdJ) , YrdK (yrdK), LysR family 
regulatory protein YrdL (yrdL), YrdN (yrdN) , cation transpo 
rt protein YrdO (yrdO) , hypothetical protein YrdP (yrdP) , Ly 

35 sR family transcription regulator YrdQ (yrdQ), hypothetical 
protein YrdR (yrdR) and hypothetical protein YrkA (yrkA) gen 
es, complete cds. NID: gl934641. >gp:gp| Z99117 | BSUB0014_140 
Bacillus subtilis complete genome (section 14 of 21) : from 2 
599451 to 2812870. NID: g2634966. 

40 gtgacaatcttagcgattgatattggagtgaatgtgggaatagcatcagcaattgtaaca 
attgtgattatacttatttctgaagtgattcctaaatcaattgctgcaacatttcctgat 
aaaatttcaaaacttgtgtatcctatcattcatatatgtgttattgtactcaagcccatt 
acaatcttattaaacaagatgacagatggtattaatcatttactatctcgaggccaacct 
gttgaaaaaagattttctaaagaagaaattcgtacattattaaatattgcgggtagagaa 

45 ggtgcatttaatgagatagaaaatactcgacttcaaaacgttatggactttgaacaattg 
aaggttaaggatgttgataccacgcctcgtattaatgttgtagctttttcaaaggaagta 
acatatgacgaagcttatgatacagtgatgaataacccatatacaagatatccagtatat 
gatgaaaatatagatgatatcatcggcgtattccactcaaaatatttattagcttggagt 
aaaaataaagaggacgcaattactaattatgcatcaagccctttatttgtaaatgaacat 

50 aatagggcagaatgggtattgcgtaaaatgaccgtttcacgaaaacatttagcgattgtt 
ttagatgaatttggaggtacggatgctatcgtatcgcacgaagatttaatagaagagcta 
cttggtatggatattgaggatgaaatggatcgtgaagaagaaaataaattaaaacatcaa 
aaatttccgcaaagcatgatgcatcgttaa 

55 Sequence 1538 

VTILAIDIGVNVGIASAIVTIVIILISEVIPKSIAATFPDKISKLVYPIIHICVIVLKPI 
TILLNKMTDGINHLLSRGQPVEKRFSKEEIRTLLNIAGREGAFNEIENTRLQNVMDFEQL 
KVKDVDTTPRINVVAFSKEVTYDEAYDTVMNNPYTRYPVYDENIDDI IGVFHSKYLLAWS 
KNKEDAITNYASSPLFVNEHNRAEWVLRKMTVSRKHLAIVLDEFGGTDAIVSHEDLIEEL 
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LGMDIEDEMDREEENKLKHQKFPQSMMHR* 

Sequence 1539 
Con t ig_0 6 1 5_pos_5 4 0 1__0 , 
5 . putative peptide of unknown function 

atggtgctcgtacaatttccaccttggtttgattgtaacgtccaaaatataaattacatc 
ttatatgtgagaaaacaattaactgatattccgatgagcattgaatttagacatcaatca 
tggtttgacaatcagtataaagaacaaactttatccttcttaacacaacatcaaatcatt 
catgcagtggtagatgaacctcaagttaaagaggggagcgttcctttagtaaataggatt 
10 actagtgaaattgcttttgtacgttatcatggacgtaatcattatggttggactaaaaaa 
gatatgactgatcaagaatggcgagatgtaagatatttatatgattatagcgatgatgag 
ttagctgacttggctcgtaaagtcgaaatacttaatcaaaaggctaagaaagta 

Sequence 1540 

15 MVLVQFPPWFDCNVQNI N Y ILYVRKQLTDI PMSIEFRHQSWFDNQYKEQTLSFLTQHQI I 
HAVVDEPQVKEGSVPLVNRITSEIAFVRYHGRNHYGWTKKDMTDQEWRDVRYLYDYSDDE 
LADLARKVEILNQKAKKV 

Sequence 1541 
20 Cont ig_0 61 6_pos_8 2 64_0 , 

putative peptide of unknown function 

atgtccgcgtttattgaacaatctcaatatattgcgattcataatcaagataatttatat 
gatgatttattccagtttttagtaaaaataaaagatatctataaaacaaaactaggtagt 
get gt gat tgaaatattaattagtcatcaacaaatggaagctagagaaacttttatgact 
25 aattactttaatcataatcgcaaagttttaaaagagattgttcgtaagcacatacaagag 
gaagaacaagatttgtttattgatttaatcttctcacccatctattttaatatattaatt 
aaacctgaaactctggatga 

Sequence 1542 

30 MSAFIEQSQYIAIHNQDNLYDDLFQFLVKIKDI YKTKLGSAVIEILISHQQMEARETFMT 
NYFNHNRKVLKEI VRKHIQEEEQDLFI DLI FSPIYFNILIKPETLDX 

Sequence 1543 

Con t i g_0 6 1 6_pos_7 2 5 5_6 64 1 , 

35 putative peptide of unknown function 

atgaatgctccatttattttaatagctgatcctagaatcgaaggtggtgccttttaccta 
gggtcagagaattatgaacaggcaatccgtaaggtcatccaaaatgctttggattatttg 
ggatttgcgaacaaccaattaattctttctggattatcaatgggatcatttggcgcactt 
tattacgctacaaaattaaatccagcggctgttattgtaggaaaacctttgataaatctc 

40 ggtactattgctaataatatgaaactcgttcgtccaaacgattttggaacgtcacttgat 
attttgcgattgaatcaaaatggcataactaacaaagatgttgttcagttagataatcat 
ttttggaagcaaattcagcatagtgatttgtcaatgaccacatttgcgattgcttacatg 
gagcatgatgattatgacaaatatgcatttcaagatttattgcctg ttcttacaaaacaa 
catgcacgtgtgataagtaaaagaattcctggtagacataatgatgattctgctactgtt 

45 actcattggtttattaatttttataatttaatcatggaagagcgatttgggagggtaaca 
catgeaagaagatag 

Sequence 1544 

MNAPFILIADPRIEGGAFYLGSENYEQAIRKVIQNALDYLG FANNQLILSGLSMGSFGAL 
50 YYATKLNPAAVI VGKPLINLGTIANNMKLVRPNDFGTSLDILRLNQNGITNKDVVQLDNH 
FWKQIQHSDLSMTTFAIAYMEHDDYDKYAFQDLLPVLTKQHARVISKRIPGRHNDDSATV 
THWFINFYNLIMEERFGRVTHARR* 

Sequence 1545 
55 Contig_0616_pos_6597_6082, 

putative peptide of unknown function 

atgtatggtacaaaattacgttttaatcaagataatatctattttgagaaccctttgatg 
ccatccggtacaatcattcacagttggtatatgttaactgattttgcagaagaccgtgta 
agccctaagctacctattttaaaaaaagggcgccaatatcaatttcaatttaattttgaa 
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gttgaacctgagggtgcggcttattttaaaatgaaattttatcgtaagaataaagaaatt 
cttagtcatcaaattctaaaaaataaaaaagaaaatattgtctatcctagagaagcatat 
tcatatgaattagaacttattaatgctggcatgaatcatctatcttttcacaatataatt 
gtgcaagaattaagagaagatagtaatcaagcttatgaggcaacgcaatatatagatcct 
5 aagaaaaaacttaaagtaattaatcaaataataaccaatataaggacacatcatctagac 
tcatcaaactatcacaggagtgatatgaatggctaa 

Sequence 154 6 

MYGTKLRFNQDNIYFENPLMPSGTIIHSWYMLTDFAEDRVSPKLPILKKGRQYQPQFNFE 
10 VEPEGAAYFKMKFYRKNKEILSHQILKNKKENIVYPREAYSYELELINAGMNHLSFHNII 
VQELREDSNQAYEATQYIDPKKKLKVINQIITNIRTHHLDSSNYHRSDMNG* 

Sequence 1547 

Contig_0616_pos_5858_3699, 
15 is similar to (with p-value 0.0e+00) 

>sp:sp| P47994 |SECA_STACA PREPROTEIN TRANSLOCASE SECA SUBUNIT 
. >pir :pir j S47149 I S47149 secA protein - Staphylococcus carno 
sus >gp :gp 1X79725 I SCSECA_2 S.carnosus (TM300) secA gene. NID 
: g4 99333. 

20 atgtatccaaaagatgtgcagattttaggagcaatcgctatgcatcaggggaatat tgca 
gaaatgcaaacaggagaaggtaagacgcttacagctaccatgcctctgtacttaaatgca 
cttacaggtaaaggtgcttatctaatcacaacaaatgattacttagcaaaacgcgatttt 
ttagaaatgaaaccactatatgaatggctaggcttgtctgtatcattaggatttgtggac 
attccagaatatgaatacgctgaaaatgaaaaatatgaactgtaccaccatgacattgtt 

25 tacacgactaatgggcgactagggtttgattatttaattgataatttagctgatgatatt 
cgtgccaaatttttaccgaaattaaactttgctattattgatgaagtcgattctattata 
ttagacgctgcccaaacgcctttagttat t tctggtgcaccacgtgtacaatctaattta 
tttcacatcgttaaaaagtttgttgaaacacttgagaaagataaagacttcatagttaat 
tttaataaaaaagaagtgtggctcactgatgagggctcggaaaaagcaagccattatttc 

30 aaagtgaatagtatataccaacagcaatattttgatttagttaggatgattcatttatcg 
cttagagctaagtatttattcaaatataatttagactattttatttttgatggtgagatt 
gtgcttatagatagaataactggtcgtatgctacctggaacaaagcttcagtctggttta 
catcaagctatagaggctctggaaaatgttgaaatttctcaagatatgagtgtgatggca 
accataacattccaaaacttatttaagcaatttgatgaattttcaggtatgactggaaca 

35 ggtaaattaggggaaaaagaattctttgatttatattcaaaagttgttatagagattccg 
actcacagtccgattgaacgagatgatagacctgatagagtatttgctaatggtgacaaa 
aagaacgatgcaattttaaagacagtgattggtatacatgaaactcaacaacctgtgtta 
ctaattacacgtactgcagaagcggcagaatatttttcagctgagttatttaaacgtgat 
atacccaacaatttattaatcgctcaaaatgtagctaaagaggcacaaatgattgctgag 

40 gcgggacaattatctgcagttactgttgctacaagtatggcagggcgtggaactgatata 
aagttatcaaaagaggttcatgatatcggtggcttagcagtgattattaatgaacatatg 
gataatagccgtgttgatcgtcaattaagaggacgctcaggtcgccaaggagatcctgga 
tattcacagatttttgtatcacttgatgatgatttagtaaaacgttggagtaactctaac 
ttggcagaaaataaaaacctccaaacgatggatgcatctaaactagaaagtagtgcactc 

45 tttaaaaaacgtgtaaagtcaattgttaataaagcgcaacgtgtatctgaagagactgct 
atgaaaaatagagaaatggcaaatgaattcgaaaaaagtattagtgttcaacgagataaa 
atttatgctgaacgtaatcacatacttgaagcaagcgattttgatgattttaattttgaa 
cagcttgcacgagatgtgtttacaaaagacgttaaaaatcttgacttaagtagtgaacgt 
gcacttgtgaattatatatacgaaaacttaagttttgtcttcgatgaagatgtatcaaat 

50 attaatatgcaaaatgatgaagaaatcatacaattcttaatacaacaatttactcaacaa 
tttaacaatcgt ttagaagttgct gctgat tea tat ttaaaact teg tt teat tcaaaaa 
tcaattttgaaagcgatagatagcgaatggattgaacaagtagataacttacaacaactt 
aaagccagtgtaaacaatcgacaaaatggacagcgtaatgtcatttttgaatatcataaa 
gtggctcttgaaacgtatgaatatatgtctgaagatataaaaaggaagatggttagaaat 

55 ttatgtttaagtattctagcctttgataaggacggagatatggtcattcatttcccataa 



Sequence 1548 

MYPKDVQILGAIAMHQGNIAEMQTGEGKTLTATMPLYLNALTGKGAYLITTNDYLAKRDF 
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LEMKPLYEWLGLSVSLG FVDI PEYEYAENEKYELYHHDIVYTTNGRLGFDYLIDNLADDI 
RAKFLPKLNFAIIDEVDSIILDAAQTPLVISGAPRVQSNLFHI VKKFVETLEKDKDFIVN 
FNKKEVWLTDEGSEKASHYFKVNSIYQQQYFDLVRMIHLSLRAKYLFKYNLDYFI FDGEI 
VLIDRITGRMLPGTKLQSGLHQAIEALENVEISQDMSVMATITFQNLFKQFDEFSGMTGT 
5 GKLGEKEFFDLYSKVVIEIPTHSPIERDDRPDRVFANGDKKNDAILKTVIGIHETQQPVL 
LITRTAEAAEYFSAELFKRDIPNNLLIAQNVAKEAQMIAEAGQLSAVTVATSMAGRGTDI 
KLSKEVHDIGGLAVI INEHMDNSRVDRQLRGRSGRQGDPGYSQI FVSLDDDLVKRWSNSN 
LAENKNLQTMDASKLES SALFKKRVKS I VNKAQRVSEETAMKNREMANEFEKS I SVQRDK 
I YAERNH ILEAS DFDDFN FEQLARDVFTKDVKNLDLSSERALVN Y I YENLS FVFDEDVSN 
10 INMQNDEEI IQFLIQQFTQQFNNRLEVAADSYLKLRFIQKSILKAIDSEWIEQVDNLQQL 
KASVNNRQNGQRNVIFEYHKVALETYEYMSEDIKRKMVRNLCLSILAFDKDGDMVIHFP* 



Sequence 1549 
15 Contig_0616_pos_3690_2173, 

is similar to (with p-value 3.0e-20) 

>sp : sp I P134 84 | TAGE_BACSU PROBABLE POLY (GLYCEROL-PHOSPHATE) A 
LPHA-GLUCOSYLTRANSFERASE {EC 2.4.1.52) (TEICHOIC ACID BIOSYN 
THESIS PROTEIN E) . >pir : pir [ SO604 8 | S0604 8 probable rodD prot 
20 ein - Bacillus subtilis >gp : gp I X15200 | BSRODC_l Bacillus subt 
ilis rodC operon. NID: g40098. >gp: gp | Z99122 | BSUB001970 Bac 
illus subtilis complete genome (section 19 of 21) : from 3597 
091 to 3809700. NID: g2636029. 

gtgttagacatgacgatttataatatcaattttggaatcggttgggccagtagtggtgtt 

25 gaatatgcacaagtgtatcgagcaaaactattaaggcaattaccttatccaacaaaattt 
atatttttagattttattcaatcagaaaatattcaaacactcacaagcaacatagggttt 
aaagatgatgaagttatttggctatatcaatacttcacagacgtaaaaatcgctcctaca 
acgtacacagttgatgatttaatttcagagttaggtaatgaggttactegaaaagaacaa 
aacggtaaagtattacgactctatttaaataatcagcaaacattcgtaacctgctattta 

30 aaaaatgctaacgaacattacgttgatcgtgcagagtttgtggtgaatggaatqttaatt 
aggaaagatttttatagctatgtaagaacattctcggagtattatgctccttttaacaat 
aaagctaaaatatatatgcgtcaattttataatgaaaatggatcaattgcatatcgcgaa 
tatatagatgaagatgaacatgtttttgtgtttgatgatgcacgtttatacagtaaacaa 
gcactcgtcgcatactttatacgtcaattgctattgaattcagaggatatcattattatt 

35 gatcgagcaactgatgtgggacaagccatattagaaaacaagggatccagtaaagttggc 
gtagtcgttcacgctgaacatttcagtgaaggtgcgactgatgggactcacattttatgg 
aataattattatgagtatcaattcgaaaatgcacagcatattgatttctttatcacagcg 
acagatttgcaaaggcagacactaagtgaacaatttaaacaatataagaatgattgtcca 
cgtatacgtacaattccagtaggtagtatagaatcgttacaatatcctgaaaaagaaaga 

40 aaaccatattccatcatgaccgcatcacgtcttgctaacgaaaaacatgttgattggata 
gtggaagctgtgattaaagctaaacatcagttacctcaattgagttttgatatctacqga 
caaggagaagaacaagaaaaaattaaaaatattattaccaaacatcgtgctgaggattac 
atacaaattaaaggacatagaaatcttcgtacaatatatcagcaatatgaattattcata 
gcggcctcacaaagtgaagggttcggactgacattaatggaagcggttggctctggttta 

45 ggtatgattggatttgatgtgaattatggcagtccgacatttattcgacatcatcaaaat 
ggctatttgataccgatagattttgaacaagcgtctactgatgatatcacaacgcaaatc 
gctcatatgattattcgatattttgaagatggtcccataagggcacatgaggtatcgtat 
gacattgcagaatcatttaaaacatcgcatattgttgatctatggagacaactcattgaa 
gaggtgctatatgattaa 

50 

Sequence 1550 

VLDMTIYNINFGIGWASSGVEYAQVYRAKLLRQLPYPTKFIFLDFIQSENIQTLTSNIGF 
KDDEVIWLYQYFTDVKIAPTTYTVDDLISELGNEVTRKEQNGKVLRLYLNNQQTFVTCYL 
KNANEHYVDRAEFWNGMLIRKDFYSYVRTFSEYYAPFNNKAKI YMRQFYNENGSIAYRE 
55 YIDEDEHVFVFDDARLYSKQALVAYFIRQLLLNSEDII I I DRATDVGQAILENKGSSKVG 
WVHAEHFSEGATDGTHILWNNYYEYQFENAQHIDFFITATDLQRQTLSEQFKQYKNDCP 
RI RTI PVGS I ES LQY PEKERKP YS IMTASRLANEKH VDWI VEIAV I FvAKHQLPQLS FDI YG 
. QGEEQEKIECNIITKHRAEDYIQIKGHRNLRTIYQQYELFIAASQSEGFGLTLMEAVGSGL 
GMIGFDVNYGSPTFIRHHQNGYLI PI DFEQASTDDITTQIAHMI IRYFEDGPIRAHEVSY 
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DIAESFKTSHIVDLWRQLIEEVLYD* 

Sequence 1551 
Con t igj) 6 1 6_pos_l 7 4 8_8 4 0 , 
5 putative peptide of unknown function 

atgacacgtgaaggtaaagaagtgatttatgagaattatgtaactaacgatgtagttgta 
gaatatgaagggaaatcttatttttttgagtcatatacagagtggattaaattttacttg 
agtgaaatgggcattgagataaaagaagttatatttaatactttatcaacaccattttta 
gcaatttatcatttgccgacattgaaaaaaggtattttattttggcaagaacaatctcag 

10 ggttatgtcccaggaaatatgaaagtcatgttatcaccaaaccttcaaagtcgct ttgcc 
gttattgtccctaatcagaatgaatacaaattgatcaaggaacaactatctagggaggaa 
caacaggcagcatatgcatctggttacttatatgacacgtataaacggaatcattattct 
aagaatgtattaacattaacaaattcagatcaaataccacatgttgaaacgttggtacgt 
ttgcataaagattatcaatttcacataggcgctaaaactgagatgtcttcaaaattatta 

15 agtttatcgcaatatgaaaatgttaaattatatccaataattaaagaacaaacagttcaa 
accttatatcaacaatgtgacatctatttagatattaatgaggggaacgaaatagggaat 
gctgtaagaagcgcatataatcatcaattgttaattatgggatataaagaggttgttcat 
aatcaagatttcgttgcaatagaaaatcagtttcttgtaaatgatataagtcagttgagt 
aacgctttgaaagagataggaaatcatcgtggtcaatttgaaacacgtttagcactacaa 

20 caacgtcatgctaatgctgtgccggtatcaacatttaaatacgcattagtacaagcatta 
agtggttaa 

Sequence 1552 

MTREGKEVI YENYVTNDWVEYEGKSYFFESYTEWIKFYLSEMGIEIKEVI FNTLSTPFL 
25 AIYHLPTLKKGILFWQEQSQGYVPGNMKVMLSPNLQSRFAVIVPNQNEYKLIKEQLSREE 
QQAAYASGYLYDTYKRNHYSKNVLTLTNSDQI PHVETLVRLHKDYQFHIGAKTEMSSKLL 
SLSQYENVKLYPIIKEQTVQTLYQQCDIYLDINEGNEIGNAVRSAYNHQLLIMGYKEVVH 
NQDFVAIENQFLVNDISQLSNALKEIGNHRGQFETRLALQQRHANAVPVSTFKYALVQAL 
SG* 

30 

Sequence 1553 

Contig_0616_pos_7 34_387, 

putative peptide of unknown function 

atggaaaattttgataaaagttatcatgataaaacgggtgatgtattaggggct ttaagt 
35 tatctaagtgtatttttcgcacctgtattgtttccattaatcgtatggattgttggacaa 
ccaccagcatctacgtattcaagaaatgcat tat t taaccatattttgagttgggtgtgt 
ttggtattaggacttatatcatttgctgctggactatccttgattgattcgacaaatgga 
gtcgctgtactagtgataggagtaattattggaggtattctacttatcgcttcgcttgta 
ttatttattattaatattgtgaagggtatcaaattattgatgatatag 

40 

Sequence 1554 

MENFDKSYHDKTGDVLGALSYLSVFFAPVLFPLIVWIVGQPPASTYSRNALFNI1ILSWVC. 
LVLGLISFAAGLSLIDSTNGVAVLVIGVIIGGILLIASLVLFIINIVKGIKLLMI* 

45 Sequence 1555 

Cont ig_06 1 8 j?os_2 7 37_2 393, 

putative peptide of unknown function 

gtgacaaaccggaggaaggtggggatgacgtcaaatcatcatgccccttatgatttgggc 
tacacacgtgctacaatggacaatacaaagggcagcgaaaccgcgaggtcaagcaaatcc 
50 cataaagttgttctcagttcggattgtagtctgcaactcgactatatgaagctggaatcg 
ctagtaatcgtagatcagcatgctacggtgaatacgttcccgggtcttgtacacaccgcc 
cgtcacaccacgagagtttgtaacacccgaagccggtggagtaaccatttggagctagcc 
gtcgaaggtgggacaaatgat tggggtgaagtcgtaacaaggtag 

55 Sequence 1556 

VTNRRKVGMTSNHHAPYDLGYTRATMDNTKGSETARSSKSHKVVLSSDCSLQLDYMKLES 
LVIVDQHATVNTFPGLVHTARHTTRVCNTRSRWSNHLELAVEGGTNDWGEVVTR* 

Sequence 1557 
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Contig_0619_pos_1598_3097, 

is similar to (with p-vaiue 0.0e+00) 

>gp:gp| D50097 | D50097_l Bacillus subtilis macronuclear gene f 
or lactate permease, complete cds. NID: g2258092. 
5 atgaaaggaatttatgctgctatcacaacccttgtggttacattattaattgcaattcca 
ttctttaaattaccagtaggaattgcctctggagcagttgttgaaggtttcttccaaggt 
atcttcccaatcggatatattgttattatggcagtattattatataagattactttgaaa 
tcggggcaat tcgcaactattcaagacagtat tacaagtatttcacaagaccaaagaatt 
cagcttcttttaattggtttttcatttaatgcattcttagaaggcgctgcaggatttggt 

10 gttccaattgcaatttgtgcacttttattagcgcaacttggctttagaccattacaagca 
gctatgttatgtttagtagctaacgctgcatctggtgcatttggtgcaattggtattccg 
gttggtgttgtagatacacttaacttacctggtcatgtagaagcgatgggagtttcacaa 
acatcaacattaactttagcaattattaacttctttattcctttcttacttatctttatc 
gtagatggtttcaaaggaattaaagaaactttaccttcaattcttgttgtttctgtcact 

15 tatacagttttacaaggattacttacagtgtttaatggtccagaattagctgatatcatt 
ccatcacttgcttctatgttagcattagctttattctctaagaaattccaacctaagaat 
atctttagagttcaaaaagatgttaaaccagaagcaccgaaaaaacttaaaggtaaagaa 
gtcttatttgcttggagtccattcattatcttaactgtcattgttatgatttggagtgca 
ccttcatttaaagcattatttgcaccaaaaggtaaattatctgctttagttgcaaacttt 

20 gacttacctggtactttcagtaatatttcacacaaaccaattactttatcattaaactta 
attggtcaaacaggtacagcaattctaattacaattattattactgttttaatggctaaa 
aaagtcaactttggtgatgctggtcgcttatttgttgaagcatttaaagaattatggtta 
ccaatcataacaatttgtttcatcttagcaatttcaaaaatcacaacatatggtggttta 
agt aatgct at gggacaaggcatctcaaaagcaggaagcgtattcccaatat tat caeca 

25 atccttggttggatcggcgtatttatgactggttcagttgttaataacaactctttattc 
gcgccaattcaagcttctgtagcacaacaaattggtacaagtggttcactacttgtagct 
tcaaatacagcaggtggggttgcggcgaaacttatttctccacaatctattgccattgca 
acagcagctgttaaagaagtaggtaaagaatctgaactacttaaaatgacattacgttat 
agtattggattacttgtatttatctgtatctggacatttatcttgtcattcattctgtaa 

30 

Sequence 1558 

MKGI YAAITTLVVTLLIAIPFFKLPVGIASGAVVEGFFQGI FPIGYI VIMAVLLYKITLK 
SGQFATIQDSITSISQDQRIQLLLIGFSFNAFLEGAAGFGVPIAICALLLAQLGFRPLQA 

35 AMLCLVANAASGAFGAIGI PVGVVDTLNLPGHVEAMGVSQTSTLTLAI INFFI PFLLI FT 
VDGFKGIKETLPSI LVVSVTYTVLQGLLTVFNGPELADI I PSLASMLALALFSKKFQFKN 
I FRVQKDVKPEAPKKLKGKEVL FAWS P FI I LTVIVMIWSAPS FKALFAPKGKLSALVAN F 
DLPGTFSNISHKPITLSLNLIGQTGTAILITI IITVLMAKKVNFGDAGRLFVEAFKELWL 
PIITICFILAISKITTYGGLSNAMGQGISKAGSVFPILSPILGWIGVFMTGSWNNNSLF 

40 APIQASVAQQIGTSGSLLVASNTAGGVAAKLISPQSIAIATAAVKEVGKESELLKMTLRY 
SIGLLVFICIWTFILSFIL* 

Sequence 1559 

Contig_0619_pos_6750_64 03, 

45 putative peptide of unknown function 

atgtttactgaattagaagatttgtatttccgaggaaatgaaagactattacatcaagct 
ttcaataacctcatcattaatgcaatgaattatgctcctcaaaatagcatgattaatatc 
actctaactagtacaaatcacttgattatatttaatattgaaaatgatggatcgattgca 
gaagaagatgcgaaacatatcttcgatcgtttttataaactgagtgacgaatctagtagt 

50 aatggtctaggtctagccattacccaatcaatcattcatcttcatcatggtagcattact 
ctcacttcagatgataaaacacaatttattgtaaaactatttatttag 

Sequence 1560 

MFTELEDLYFRGNERLLHQAFNNLIINAMNYAPQNSMINITLTSTNHLIIFNIENDGSIA 
55 EEDAKHI FDRFYKLSDESSSNGLGLAITQS I I HLHHGSITLTSDDKTQFIVKLFI * 

Sequence 1561 

Contig_061 9_pos_4 4 28_3304 , 

is similar to (with p-value 5.0e-37) 
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>gp:gp| AL023702 | SC1C3_12 St reptomyces coelicolor cosmid 1C3. 
MID : g3169026. 

atgcataatatttatgcaatgggcggaacggtaaagtcggtgacacaacttgcaaataca 
ctggcagaaaaaggacatcctgtaacaattatttcagtttttagaggcgcagactctcca 
5 tattttgaattacattcagcaataaaagttaaagtcgtagtggactatcgcttaaagctt 
aaaaatactagagctattacggcaaatcgtatcaaaaagtataccccctttttaaataca 
aaagtgatttctcaatttgagccaggtaaaagtcagttttcgagttatgtagagaaaaaa 
atgattaaagcaatcaggcatactaaaactgatgtactcgttggaacaagagctagcttt 
aatattctcatttctaaatatgctaaagctgaaatagtgaccatcgcaatggaacatatg 

10 aattttgatgctcaccctgatcagtatcaaaaggaaattattgctgcgtaccgtaatatc 
aataagattacaacgttaactgtcgcagaccagcaaaaatatcaatcacaacttaaaact 
cctgtatacgttatacctaatatggttaccgaaaaaagaattgctgctccaaagaaaaat 
cgtattattagcgccggacgtttagaatatgaaaaagggtatgatttattattagagagt 
attcgtttaatacaagaagacttgcgtcaattgaattatgacgttcacatctatggttct 

15 ggtagtaagaaaacatcacttgttgactttattaatcaatatcatttaaatgatttgatt 
aaaatatatgagccaacgcaagaattaaataataaacttgcacaaagtaaaatcgttgtt 
gtaccttcacgcaatgaaggtttcggaatgattattttagaggcaatggtgcaagataat 
atagtaataagttttgaaggcaatgtagggccagattcaatcattaacaacggagataat 
ggttatttagtaaactatgaaaatgtgtctgaacttgcaaaacgtatcgatttaacaaca 

20 caacattataatgagttagatcacatcattgaaaatagtaaagatacgttgaaacaatt t 
agtccggatcatatatatcaattatttatgtctatgtttaaataa 

Sequence 1562 

MHNIYAMGGTVKSVTQLANTLAEKGHPVTIISVFRGADSPYFELHSAIKVKVVVDYRLKL 
25 KNTRAITANRIKKYTPFLNTKVISQFEPGKSQFSSYVEKKMIKAIRHTKTDVLVGTRASF 
NILISKYAKAEIVTIAMEHMNFDAHPDQYQKEI IAAYRNINKITTLTVADQQKYQSQLKT 
PVYVI PNMVTEKRIAAPKKNRI ISAGRLEYEKGYDLLLESI RLIQEDLRQLNYDVHI YGS 
GSKKTSLVDFINQYHLNDLIKI YEPTQELNNKLAQSKIVVVPSRNEGFGMIILEAMVQDN 
IVISFEGNVGPDSI INNGDNGYLVNYENVSELAKRIDLTTQHYNELDHI IENSKDTLKQF 
30 SPDHI YQLFMSMFK* 

Sequence 1563 

Cont ig_0 61 9_pos_0_l 14 8, 

putative peptide of unknown function 

35 atgataaagcaaataaatatttcaaatatggacaaattaaaagagcaaatggaacgtgca 
cttagcgacggctatacgcatgtcatcccctattcaaatgaaattcaaattcatcagtcc 
atgattaaagctatcactttacctaagacttcatttatagttgactatacaattaacaat 
tattatttaaacgattgtaaatacttcgggttggactttgttgattttgaggactgggtt 
aaaaatattaatttatatccaaatgttatttatgaaattaattcaacattagaacttatt 

40 gataaatt tgaagttgaaaatatctttgatttagcattattaacaattcttaaagggcat 
atcgcagttgaaggtcatgtcgtattagactttaaaggaccattaaaaacgagcaaggga 
ttttggcgctcatttgaccgtaatgatttaacttatagagataaattcttcttaaacacc 
atcgcttatgcacataaacaaagaatcccatttacgcgtgtaccatttaacgatcacgat 
agtattagatattatgattcagtactacttagtactaaatttaaagctccaagatggtta 

45 gtgactcctattaagaattattcagttaaaaaacacaaagagattagctatatttataaa 
aaggattcatcaaaacttaagaaccacgtcgtttttctaggatttgatttcggctatcga 
ggaaactctaagtatttatttaattactttgttaaacacaatcctatgatagagtcttac 
tttataacagatgagagaacaggaccacattttatttcaactaacgatgaaaatgtgaag 
aatttgattgaaacagctacttttgtcattacggaaagctatattcctgacgacattcac 

50 cctaatggtaaaatcatccaattatggcatgggacacctattaaaaaactatttttagat 
agtaaagagccacaccaaaatttaaatatatataactaccgagctcgaaaatataataaa 
tggacacaacaagattatttaattgtagattcagaagaatcaaaaacatactttgaatca 
gcgtttcctagtcaaaaaattgatatattacctgtaggatatcctagaaataattattta 
ttaaatAA 

55 

Sequence 1564 

MIKQINISNMDKLKEQMERALSDGYTHVIPYSNEIQIHQSMIKAITLPKTSFIVDYTINN 
YYLNDCKYFGLDFVDFEDWVKNINLYPNVI YEINSTLELIDKFEVENIFDLALLTILKGH 
IAVEGHVVLDFKGPLKTSKGFWRSFDRNDLTYRDKFFLNTIAYAHKQRIPFTRVPFNDHD 
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SIRYYDSVLLSTKFKAPRWLVTPIKNYSVKKHKEISYIYKKDSSKLKNHWFLGFDFGYR 
GNSKYLFNYFVKHNPMIESYFITDERTGPHFISTNDENVKNLIETATFVITESYIPDDIH 
PNGKIIQLWHGTPIKKLFLDSKEPHQNLNIYNYRARKYNKWTQQDYLIVDSEESKTYFES 
AFPSQKIDILPVGYPRNNYLLNX 

5 

Sequence 1565 

Contig_0620_pos_622__94 2, 

putative peptide of unknown function 

atgactaaaattagtgttgtcgtatatggagcagaagtcgtttgtgcgagttgtgtaaat 
10 gcacctacatctatagatacttatcaatggcttcaagcattacttttaagaaagtttcct 
caacatcattttgaatttacatatattgacatacgaaatgatactgaaaatttaactgat 
catgatatgcaatttatagaaagaattaatgaagatgaattgttttacccattagttacg 
atgaatgatgaatatgtagcagatggttacatacaatataaacaaataacccgttttatt 
aaatcatattttactatgtaa 

15 

Sequence 1566 

MTKISVVVYGAEVVCASCVNAPTSIDTYQWLQALLLRKFPQHHFEFTYIDIRNDTENLTD 
HDMQFIERINEDELFYPLVTMNDEYVADGYIQYKQITRFIKS YFTM*' 

20 Sequence 1567 

Contig_0622_pos_664 5_74 33, 

is similar to (with p-value 4.0e-58) 

>sp: sp I P4 5024 | YA80_HAEIN HYPOTHETICAL AMINO-ACID ABC TRANS PO 
RTER BINDING PROTEIN HI1080 PRECURSOR . >pir : pi r| 1641811 16418 

25 1 glutamine-binding periplasmic protein (glnH) homolog - Hae 
mophilus influenzae (strain Rd KW20) >gp : gp | U32788 | U32788_5 
Haemophilus influenzae Rd section 103 of 163 of the complete 

genome. NID: gl574629. 
atgaaaagacttttactttgcattgttgcacttgtttttgttttagcagcctgtggcaac 

30 aattcatctaacaataaagataatcaatcaagcagtaaagacaaggatacgttaagagtt 
ggtacggaaggtacatatgcgccctttacttaccataataaaaaagatcaattaacaggt 
tatgatattgatgtgat taaagcagttgcaaaagaagaaaatcttaaacttaagtttaat 
gaaacgtcatgggattcaatgtttgcaggattagatgctggtcgttttgatgttattgca 
aatcaagtgggtgtgaataaagatagagagaaaaaatataaattctctgaaccttacaca 

35 tattcaagtgctgtacttgttgttcgtgaaaatgaaaaagatattacatcattcaatgat 
gtaaaaggtaaaaagttagcacaaacgtttacgtctaattatggtcaattggctaaagat 
aagggtgcggacgttactaaggtagatggatttaatcaatcaatggacttactattatct 
aaacgtgtagatggtacatttaacgacagtttatcttacttagattacagaaaacaaaag 
cctaatgctaaaattaaagcaatcaaaggacatgcagagcaaaataaatcagcatttgca 

40 ttctctaagaaggttgatgaaaaaacgattgagaaatttaataaaggcctagaaaaaatt 
agagataatggtgaattagctaaaattggtaagaaatggtttggtcaagatgtttctaaa 
cctgaataa 

Sequence 1568 

45 MKRLLLCIVALVFVLAACGNNSSNNKDNQSSSKDKDTLRVGTEGTYAPFTYHNKKDQLTG 
YDIDVIKAVAKEENLKLKFNETSWDSMFAGLDAGRFDVIANQVGVNKDREKKYKFSEPYT 
YSSAVLVVRENEKDITSFNDVKGKKLAQTFTSNYGQLAKDKGADVTKVDGFNQSMDLLLS 
KRVDGTFNDSLSYLDYRKQKPNAKIKAIKGHAEQNKSAFAFSKKVDEKTIEKFNKGLEKI 
RDNGELAKIGKKWFGQDVSKPE* 

50 

Sequence 1569 

Contig_0622_pos_7 4 95_8133, 

is similar to (with p-value 6.0e-57) 

>sp: Sp! P4 5023 I YA7 9_HAEIN HYPOTHETICAL AMINO-ACID ABC TRANS PO 
55 RTER PERMEASE PROTEIN HI0179. >gp : gp I U32788 \ U32788_4 Haemoph 
ilus influenzae Rd section 103 of 163 of the complete genome 
. NID: gl574629. 

gtgaaatactcaatccctattactttggtcactttcattctaggtttaatcattgcattg 
tttactgcacttatgcgtatatcaaccagtaaattgcttagaggtattgcgcgtgtctat 
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gtatcaattattcgtggtacacctatgattgtacagttatttattattttttacggtata 
ccggagcttggaagattggtaactaacaatgctgataatcaatggacacttgcacctgtt 
attgctgcagtcattggtttatctttaaatgttggtgcttatgcttctgaaattatacga 
ggtggtatattgtctattcctaaaggacaaacagaagcggcttactctataggtatgaac 
5 tatagacaaactgtgcaacgtattatcttaccacaagctattcgtgtatctataccagca 
ctaggaaacacatttttaagtttaattaaagatacatcattacttggatttattcttgtt 
gcagagatgtttagaaaggcacaagaagttgcttcgacaacgtatgagtatctaactatt 
tatttgttagtagctttaatgtattgggtcgtatgttttgtcatctcaattatccaagga 
tggtatgaatcacgcattgaaagagggtatcgctcatga 

10 

Sequence 1570 

VKYSI PITLVTFILGLIIALFTALMRISTSKLLRGIARVYVSIIRGTPMIVQLFIIFYGI 
PELGRLVTNNADNQWTLAPVIAAVIGLSLNVGAYASEIIRGGILSIPKGQTEAAYSIGMN 
YRQTVQRIILPQAIRVSI PALGNTFLSLIKDTSLLGFILVAEMFRKAQEVASTTYEYLTI 
1 5 YLLVALMYWVVCFV I S 1 1 QG W YESRI ERG YRS * 

Sequence 1571 

Contig_0622_pos_82 65_88 61, 

is similar to (with p-value 6.0e-59) 

20 >sp:sp|P394 56| YCKI_BACSU PROBABLE AMINO-ACID ABC TRANSPORTER 
ATP-BINDING PROTEIN. >pir : pir I S52383 I S52383 probable ATP bi 
nding protein - Bacillus subtilis >gp: gp | X77 636 | BSPAAT_3 B.s 
ubtilis putative amino acid transporter gene. NID: g666980. 
atgattaatgctttagagatacctactgaaggtacagtgtatgtcaatggcatgacatat 

25 aatgctaaagataagaaatctcaaattaaagtaagacaacaatcaggaatggtttttcaa 
aattataatttatttccacataaatctgcattagaaaacgttatggaaggtct tataaca 
gttaaaaagatgaataaagcaacggctaatgaagaagcaatgaatttattgtctaaggtt 
ggattggtacatgttaaagatcaacggccacatgctttatcaggagggcaacaacaacgt 
gtcgcaattgcacgtgcattagccatgaatcctaaagtgatgttatttgatgagccaaca 

30 tctgcgcttgatcctgaattggtcaatgatgtattaaaagtcataaaagaattggctgac 
gaaggtatgacaatggtcattgtgact cacgagatgcgttt tgccaaagaagtttccaat 
caaattgcttttattcatgagggtgttattgcagaacaaggtacgcctgaagatatattt 
aatcatcccaaaacagaagagcttcagcgatttttaaatgtgattaatgaaaaatag 

35 Sequence 1572 

minaleiptegtvyvngmtynakdkksqikvrqqsgmvfqnynlfphksalenvmegi.it 
vkkmnkataneeamnllskvglvhvkdqrphalsggqqqrvaiaralamnpkvmlfdept 
saldpelvndvlkvikeladegmtmvivthemrfakevsnqiafihegviaeqgtpedif 
nhpkteelqrflnvinek* 

40 

Sequence 1573 

Contig_0622_pos_4 380_4018, 

is similar to (with p-value 3.0e-43) 

>sp:sp| P46899|RL18_BACSU 50S RIBOSOMAL PROTEIN L18. >gp:gp|L 
45 4 7 971 |BACRPLP_10 Bacillus subtilis ribosomal protein (rplPNX 
EFROQ, rpmCDJ, rpsQNHEMK) genes, integral membrane protein ( 
secY) gene, adenylate kinase (adk) gene, methionine aminopep 
tidase (map) gene, inititation factor 1 (infA) gene, RNA pol 
ymerase alpha (rpoA) gene. NID: gl044970. 
50 atgatcagtaaaattgataaaaacaaagtacgtttgaaaagacatgctcgtgttcgtact 
aaattatcaggtacagctgaaaagccacgtttaaatgtgtatcgttcaaacaaacaca tc 
tatgcacaaattattgatgacgttaaaggcgtaacacttgctcaagcatcatcacaagat 
aaagatattgcaaacacatcagcttcaaaagttgacttagcaactactgttggtcaagaa 
attgctaaaaaagctaacgataaaggtattaaagaaatcgtcttcgatcgcggaggatat 
55 ttataccacggacgtgttaaagctttagctgatgctgcaagagaaaatggattagaattt 
taa 

Sequence 1574 

MISKIDKNKVRLKRHARVRTKLSGTAEKPRLNVYRSNKHIYAQIIDDVKGVTLAQASSQD 
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KDIANTSASKVDLATTVGQEIAKKANDKGIKEI VFDRGGYLYHGRVKALADAARENGLEF 
* 

Sequence 1575 
5 Contig_0622_pos_3998_34 98, 

is similar to (with p-value 3.0e-53) 

>sp:sp|P214 67|RS5_BACSU 30S RIBOSOMAL PROTEIN S5 (BS5) . >gp: 
gp I L4 7971 | BACRPLP_11 Bacillus subtilis ribosomal protein (rp 
1PNXEFROQ, rpmCDJ, rpsQNHEMK) genes, integral membrane prote 

10 in (secY) gene, adenylate kinase (adk) gene, methionine amin 
opeptidase (map) gene, inititation factor 1 (infA) gene, RNA 
polymerase alpha (rpoA) gene. NID: gl044970. >gp: gp I Z99104 | 
BSUB0001_133 Bacillus subtilis complete genome (section 1 of 
21): from 1 to 213080. NID: g2632267. 

15 atggctcgtagagaagaagaaactaaagaatttgaagaacgcgttgttacgattaaccgt 
gttgctaaagttgtaaaaggtggtcgtcgtttccgtttcactgcattagtggttgttgga 
gataaaaatggtcgtgtaggtttcggtactggtaaagcgcaagaggtaccagaagctatc 
aaaaaagctgttgaagcagctaaaaaagatttagtagttgttccacgtgtagaaggtacg 
act cctcat acta taactggtcaatatgggtcaggtagcgtatttatgaaaccagctgca 

20 cctggtacaggagttatcgctggtggaccagttcgtgccgtattagagttagcaggaatt 
actgatatcttaagtaaatctttaggatcaaataatcctattaatatggttcgtgcgact 
atcaacggtttacaaaacttaaaaaatgcagaagatgttgctaaattacgtggcaaatct 
gtagaagaattatacaattaa 

25 Sequence 1576 

MARREEETKEFEERVVTINRVAKVVKGGRRFRFTALVWGDKNGRVGFGTGKAQEVPEAI 
KKAVEAAKKDLVWPRVEGTTPHTITGQYGSGSVFMKPAAPGTGVIAGGPVRAVLELAGI 
TDILSKSLGSNNPINMVRATINGLQNLKNAEDVAKLRGKSVEELYN* 

30 Sequence 1577 

Contig_0622_pos_2773_1550, 

is similar to (with p-value O.Oe-JOO) 

>sp:sp|Q05217|SECY_STACA PREPROTEIN TRANSLOCASE SECY SUBUNIT 
. >pir :pir I S30115 | S30115 secY protein - Staphylococcus carno 
35 sus >gp:gp|X7 008 6| SCSECY_1 S.carnosus secY gene. NID: g49188 

atgttagttatttttaaaataggaacgtatattcctgctccaggagttaatcctgaagcc 
tttaatcatccacagggatctcaaggtgccactgagttattaaatacttttggtggcggt 
gccttgaaacgtttctcaatatttgcgatgggaatcatgccttatatcactgcatccatc 

40 gtcatgcaattactgcaaatggatattgttcctaaatttacagagtgggcaaaacaaggt 
gaaatgggtagaagaaaaattaataacgtaactcgttattttgctataattttagctttt 
atccaatctataggtatggctttccaatttaataactatctcaaaggacaacttat tat a 
gaaaagtctgttatgagttatttattaattgcagttgtattaacagcgggaacagctttc 
ttaatttggcttggtgaccaaatcacacagtttggtgttggtaacggtatttctcttatc 

45 atctttgcaggtatattatcaactttaccttcgagtctagaacaatttgcacaatcagtg 
tttgtaggtcaagacgatacttcacttgcttggctgaaaatactaggattgattgtagcc 
ttgattttactaacagtaggcgcaatatttgttcttgaagctaaacgtaaaatacctatt 
caatatgcaaagaaacaatctgctcaacgattaggttcacaagcaacttatctacctttg 
aaagttaactctgccggtgttattccagttatctttgcgatggcgtttttcttgttacca 

50 agaacattgactttattcttcccgaaagcagaatgggcacagaatattgctgatactgcc 
aacccatcaagtaatattggaatgattatttatgtagttttaattattgcatttgcatat 
ttttatgcttttgtacaagttaatcctgaaaaaatggcagataaccttaaaaagcaaggt 
agttatgtcccaggaattagacctggtgaacaaacaaaaaaatatattactaaagtactt 
tatagattgacttttgttggttcaattttcttagcagctatagctattttacctataatt 

55 gcgactaaatttatgggcttaccacaatcaattcaaattggtggtacgagtcttttgatc 
gttattggtgtagctattgaaactatgaaaactttagaagcacaagtcactcaaaaagaa 
tataaaggctttggtggtagataa 

Sequence 1578 



399 



WO 01/34809 



PCTAJS00/30782 



MLVI FKIGT Y I PAPGVN PEAFNH PQGSQGATELLNTFGGGALKRFS I FAMG I MPY I T AS I 
VMQLLQMDI VPKFTEWAKQGEMGRRKINNVTRYFAI ILAFIQSIGMAFQFNNYLKGQLI I 
EKSVMSYLLIAVVLTAGTAFLIWLGDQITQFGVGNGISLI I FAGILSTLPSSLEQFAQSV 
FVGQDDTSLAWLKILGLIVALILLTVGAIFVLEAKRKIPIQYAKKQSAQRLGSQATYLPL 
5 KVNSAGVI PVI FARAFFLLPRTLTLFFPKAEWAQNIADTANPSSNIGMII YVVLIIAFAY 
FYAFVQVNPEKMADNLKKQGSYVPGIRPGEQTKKYITKVLYRLTFVGSI FLAAIAILPI I 
ATKFMGLPQSIQIGGTSLLIVIGVAIETMKTLEAQVTQKEYKGFGGR* 

Sequence 1579 
10 Contig_0622_pos_1532_885, 

is similar to (with p-value 2.0e-82) 

>sp:sp|P16304|KAD_BACSU ADENYLATE KINASE (EC 2.7.4.3) (ATP-A 
MP TRANS PHOS PHORYLASE ) . >pir:pir IJS0492 I JS0492 adenylate kin 
ase (EC 2.7.4.3) - Bacillus subtilis >gp: gp | L4 7 971 | BACRPLP_1 

15 5 Bacillus subtilis ribosomal protein (rpl PNXEFROQ, rpmCDJ, 
rpsQNHEMK) genes, integral membrane protein (secY) gene, ade 
nylate kinase (adk) gene, methionine aminopeptidase (map) ge 
ne, inititation factor 1 (infA) gene, RNA polymerase alpha ( 
rpoA) gene. NID: gl044970. >gp:gp| Z99104 | BSUB0001_137 Bacill 

20 us subtilis complete genome (section 1 of 21) : from 1 to 213 
080. NID: g2632267. >gp : gp I D00619 I BACSECY_4 Bacillus subtili 
s genes for ribosomal proteins, SecY, adenylate kinase and m 
ethionine amino peptidase, complete cds . NID: g216336. 
atgaatatcattttaatgggcttacctggtgcaggtaaagggactcaggcgagtgaaatt 

25 gttaagaaattcccaataccacatatttctactggtgacatgttcagaaaagcgattaaa 
gatgaaacagatt taggaaaagaagctaaatcatatatggatcgtggagaattagttcct 
gatgaagttactgtaggtatcgttaaagaaagaatttctgaagacgatgcaaaaaaagga 
ttcttgttagatggattcccaagaactatagatcaagctgagtcattaagtcaaattatg 
tctgagcttgatagagaaattgatgctgtcattaatatcgaagttcctgaggaagaatta 

30 atgaatcgtcttacaggtcgtcgtatctgtgagaaatgtggtacaacatatcatcttgta 
tttaatcctccaaaggttgatggtatatgtgatatcgatggtggaaagttatatcaacgt 
gaagatgacaatccagaaacagtatctaatcgtttgagcgttaatgttaaacaatctaag 
cctattttagaatattacaacaacaaaggtgtcttgaaaaacattgatggttcaaaagat 
attgacgaagtaaccaacgatgtcattgatatcttagatcatttataa 

35 

Sequence 1580 

MN 1 1 LMGLPGAGKGTQAS E I VKKFPI PH I STG DMFRKAI KDET DLGKEAKS YMDRGELVP 
DEVTVGIVKERISEDDAKKGFLLDGFPRTIDQAESLSQIMSELDREIDAVINIEVPEEEL 
MNRLTGRRICEKCGTTYHLVFNPPKVDGICDIDGGKLYQREDDNPETVSNRLSVNVKQSK 
40 PI LE Y YNNKGVLKNI DGSKDI DEVTNDVI DILDHL* 

Sequence 1581 

Contig_0623jpos_9597_9896, 
putative peptide of unknown function 
45 gtggtggtactgctgcgtaacttgtgcccagaaaatcaacgcctaaatattgtcttttat 
ctgcccacatttgacgaacttccattgttagagggacttgatttgtactttctaaagcaa 
caatatcgacattcatatcactatctaatacaacttttacagcctcaggatcccaaaatg 
cattccattcagctgtaccatcatgttctggttcttcaacatttcctttgtctaaaaacg 
ttccacccatccaaactaatttctctatattttttaaaattgagttgtcatattttatag 

50 

Sequence 1582 

WVLLRNLCPENQRLNIVFYLPTFDELPLLEGLDLYFLKQQYRHSYHYLIQLLQPQDPKM 
HSIQLYHHVLVLQHFLCLKTFHPSKLISLYFLKLSCHIL* 

55 

Sequence 1583 

Contig_0623_pos_13964_14 308, 
putative peptide of unknown function 

gtgacaaaccggaggaaggtggggatgacgtcaaatcatcatgccccttatgatttgggc 
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tacacacgtgctacaatggacaatacaaagggcagcgaaaccgcgaggtcaagcaaatcc 
cataaagttgttctcagttcggattgtagtctgcaactcgactatatgaagctggaatcg 
ctagtaatcgtagatcagcatgctacggtgaatacgttcccgggtcttgtacacaccgcc 
cgtcacaccacgagagtttgtaacacccgaagccggtggagtaaccatttggagctagcc 
5 gtcgaaggtgggacaaatgattggggtgaagtcgtaacaaggtag 

Sequence 1584 

VTNRRKVGMTSNHHAPYDLGYTRATMDNTKGSETARSSKSHKVVLSSDCSLQLDYMKLES 
LVIVDQHATVNTFPGLVHTARHTTRVCNTRSRWSNHLELAVEGGTNDWGEVVTR* 

10 

Sequence 1585 

Contig_0623_pos_15682_15359, 
putative peptide of unknown function 

gtgaaaaaagtggaaaatacaaatagtaatatgcataaaaatgacaatagttacgtttta 
15 tcagcattgagttatttaagtatctttttcgcacctgtcattttacctttatttgtatgg 
atacttgctgatgaaccaacatcagaacatggcaaaaaagcattcattaatcatattatg 
acttgggttagtttttttataggtagattggcatttattttttctaaagaagtctttgat 
aaacctttggatcatcaattattaatattcatccgctcacttttcaacgtaagtcggttc 
ggtcctccattcagtgttacctga 

20 

Sequence 1586 

VKKVENTNSNMHKNDNSYVLSALSYLSIFFAPVILPLFVWILADEPTSEHGKKAFINHIM 
TWVSFFIGRLAFIFSKEVFDKPLDHQLLIFIRSLFNVSRFGPPFSVT* 

25 Sequence 1587 

Contig_0623_pos_11160_10618, 
putative peptide of unknown function 

atggttggtatgagagcgactgagagccatgatttaattcttgatgatgtttatgttcct 
aatgaaaattttgtagaatcaaaacgtgaatcaagacctaatggttggcttcttcatata 

30 ccgagctgttatctaggtattgcacaagcagctcgtgactatgcagtagattttgcaaaa 
aattatcgtcctaatagtattacaggtacgattgatagtttacctacagtgcaacaaaat 
ttagggaaaatggaaagtttattactttctgcaagacattttctatggagtacagctaga 
gggtatcaat catatacagaggatgcacaaata tggaa tgaaacctcagcaagtaaagtg 
gtggtaatgaaccaaggtatagaaatcgttgatttagctatgagaatagttggagctaag 

35 agtctagaaatgagcagacctcttcaacggtactatagagatatacgtgctggattacat 
aatccaccaatggaagatatggcttacactaatattgctaaaagtattacaaacaaactt 
taa 

Sequence 1588 

40 MVGMRATESHDLILDDVYVPNENFVESKRESRPNGWLLHI PSCYLGIAQAARDYAVDFAK 
NYRPNSITGTIDSLPTVQQNLGKMESLLLSARHFLWSTARGYQSYTEDAQIWNETSASKV 
VVTyiNQGIEIVDlJ^RIVGAKSLEMSRPLQRYYRDIRAGLHNPPMEDMAYTNIAKSITNKL 
+ 

45 Sequence 1589 

Contig_0623_pos_9215_8700, 

putative peptide of unknown function 

atgactgctatcatttgtattttaggtatagtaccaagtgtacctctgccatttatgcca 
gtccctattgtattacagaatataggaatctttttggcaggaattattttagggcgaaag 

50 cttggtactactagtgttattgtctttttactattggtagctacaggtttgccagtgctt 
tctggaggccgtgggggaattggcgtatttgcaggaccttcggcaggattcttattctta 
tatcctgttgtagcttactttataggcattattcgtgatgcatatttgcataaaattaat 
ttcttagtgatttttatagctacactagttatcggtgtattaggattagatatattaggt 
acta teat tatgggctttattatacatatacctatctctaaggcatt tat at tatcattt 

55 acatttatgccaggtgatatcattaaggctattattgcaagtttaataggtgcagcaatt 
ttaaatcattcacgtttcaagactcttattcaataa 

Sequence 1590 

MTAIICILGI VPSVPLPFMPVPIVLQNIGIFLAGIILGRKLGTTSVIVFLLLVATGLPVL 
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SGGRGGIGVFAGPSAGFLFLYPVVAYFIGIIRDAYLHKINFLVIFIATLVIGVLGLDILG 
TI IMGFIIHIPISKAFILSFTFMPGDIIKAIIASLIGAAILNHSRFKTLIQ* 

Sequence 1591 
5 Con t i g_0 6 2 3_pos_8 4 8 6_7 908, 

putative peptide of unknown function 

atgcatctaaatcaat catccta teat ctttcatttt cat ttccagctaatgaaaagata 
gatgaagtattgttggaaaaaatacgtgaactaggttttcagataggagtacttgagctc 
tatgtcattgaagctaaagcgttaaaagagctctcccgcaaaagagacgtagatattcaa 

10 cttgtatcaagcaataatatcaatgattaccttcatgtttatgatgcgtttgcacggcct 
tttggtgatagctatgccaacatggttaaacaacatatttatagctcatataacttggac 
gatattgaacgtttagttgcatatgttaaccatcaaccagttggaatagtcgatattata 
atgacggataaaacaatagaaatagatggttttggggttttagaagaattccaacatcaa 
ggtatcggttctgaaatacaagcttacgttggacgtatggctaatgagcgacctgttatt 

15 cttgttgcagatggaaaagatactgctaaagatatgtatctaagacaaggatatgtatat 
caaggttttaagtatcatattttaaaagaaaatatttaa 

Sequence 1592 

MHLNQSSYHLSFSFPANEKIDEVLLEKIRELGFQIGVLELYVIEAKALKELSRKRDVDIQ 
20 LVSSNNINDYLHVYDAFARPFGDSYANMVKQHI YSSYNLDDIERLVAYVNHQPVGIVDII 
MTDKTIEIDGFGVLEEFQHQGIGSEIQAYVGRMANERPVILVADGKDTAKDMYLRQGYVY 
QGFKYHILKENI * 

Sequence 1593 
25 Contig_0623_pos_6439_5762, 

is similar to (with p-value 5.0e-80) 

> gp : gp I ABO 151 95 I AB0151 95_4 Staphylococcus aureus gene for Ly 
tN and Eprh, complete cds . NID: g3767591. 

atggacattattaaagaaatgaaaaaagcaaatgttagttttacaacatactttgatgat 
30 aactacccttctctttgcaaagaaatgtatgattatccttatgtgatattctacaaagga 
aatccacagttctttaatcattctcactctttagctgtaattggctcacgtaatgccaca 
caatatacaagtcaatctttaaactatctttttccttcatttagacaattaaatatggcg 
attgtttctggattagcgcgcggtgcagatagtgtagcacatcaaaccgcacttaaatac 
ctattaccaactattggcgtacttggatttggccattgttatcattatcctaaagcaacc 
35 ttaaatttaagaactaaagttgaaaggaatggcttagtgataagtgaatatccaccattt 
tctcctataagtaagcataaatttcctgaaagaaacaggcttataagtggtctgtccaga 
ggggtacttataactgaggctgaagaaagaagtggtagtcaaatcactatcgattgtgct 
ttagagcaaaatagaaatgtttatgttctacctggttcaatgttcaacaaaatgactaaa 
ggtaatttaagaaggataaatgaaggtgctcaagttgttatagatgaaagtagtatatta 
40 tatgattatctattttag 

Sequence 1594 

MDIIKEMKKANVSFTTYFDDNYPSLCKEMYDYPYVIFYKGNPQFFNHSHSLAVIGSRNAT 
QYTSQSLNYLFPSFRQLNMAIVSGLARGADSVAHQTALKYLLPTIGVLGFGHCYHYPKAT 
45 LNLRTKVERNGLVISEYPPFSPISKHKFPERNRLISGLSRGVLITEAEERSGSQITIDCA 
LEQNRNVYVLPGSMFNKMTKGNLRRINEGAQVVIDESSILYDYLF* 

Sequence 1595 
Contig_0623_pos_5489_3510, 

50 is similar to (with p-value 0.0e+00) 

>sp:sp| P39814 |T0P1_BACSU DNA TOPOISOMERASE I (EC 5.99.1.2) ( 
OMEGA- PROTEIN) (RELAXING ENZYME) (UNTWISTING ENZYME) (SWIVEL 
ASE) . >gp : gp I L27 7 97 | BACSMF_2 Bacillus subtilis (smf) gene, 3 
1 end, DNA topisomase gene, complete cds, (gid) gene, 5 f end 

55 . NID: g520751. >gp: gp | Z99112 | BSUB0009_82 Bacillus subtilis 
complete genome (section 9 of 21): from 1598421 to 1807200. 
NID: g2633902. >gp : gp | AJ000975 | BSYLQGCOD_6 Bacillus subtilis 

ylqg to codV gene region. NID: g24 62964. 
atgggacatgttcgtgacttaccaagaagtcaaatgggtgtcgacactgaagataactat 
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gaaccaaaatatattacaattcgtggcaaaggtcctgtagttaaagatttaaaaaaacat 
gcgaaaaaagcaaaaaaaatatttttagctagtgaccctgaccgtgaaggtgaagcgatt 
gcttggcatttatcaaaaattttagaattagaagatagcaaagaaaatagagtagtattt 
aatgaaattacaaaagatgctgttaaagatagttttaagcatcctcgtggtattgaaatg 
5 gatttagttgacgcgcaacaagcacgtcgtattttagatagactcgttggttataatatt 
tctccagtattatggaagaaagttaaaaaagggctttctgctgggagagttcagtcagtt 
gctttacgtttagtcattgatcgtgaaaatgaaattcgtaattttaaacctgaagagtat 
tggtccattgaaggtgaatttagatacaagaaatctaaatttacagctaaatttctacac 
tataaaaataaaccttataagctaaacaacaaagacgatgttcaaaggattactgaagca 

10 ttaaatggtgatcaatttgaaatcacaaatgtgaatcgtaaagaaaaaacacgttatcct 
gctcatccatttactacatcaaccttacaacaagaagctgcacgtaaactaaattttaaa 
gcacgcaagacaatgatgttagcacaacaattatacgaaggtattgacttaaagcgtcaa 
ggtacagtaggtttaattacgtatatgcgtaccgattctactcgtatctcaacttctgca 
aaatcagaagcgcagcaatatataaatgataaatatggtgaacagtacgtgtctcagcgt 

15 aaatcatcgggtaaacagggcgatcaagatgctcacgaagctattagacctactagtaca 
atgcgaactcctgatgacatgaaagcttttcttactagagatcaacaccgtctatacaaa 
ttaatttgggaaagatttgtagcaagtcagatggctccagctattttggatacagtagct 
ttagatgt aactcaaaacgacattaaatttagagctaatggtcaaactattaaatt taaa 
ggttttatgacactatatgtagaagcaaaagatgataaagagaatgataaagaaaataag 

20 cttcctcaactagataaaggagataaggtaactgcgacaaagattgaaccggcacaacac 
tttacacaacctcctcctcgttatactgaggcgcgtttagttaaaacgcttgaggaactt 
aaaattggaagaccttcaacatatgctccaaccattgatacgattcaaaagcggaactac 
gtcaagttagaaagtaaacgcttcatcccaactgaattaggagaaattgtttatgagcaa 
gttaaagaatacttcccagaaattattgatgtagaattcactgtaaacatggaaacatta 

25 cttgataaaattgccgaaggtgacatgaattggcgtaaagtaataggagacttctacaac 
agttt taaacaagatgt tgaacgcgcagaatctgaaatggaaaagattgagattaaagac 
gagccagctggtgaagattgtgaagtctgtggttctccaatggttattaaaatgggaaga 
tatggtaagtttatggcatgttcgaactttccggactgtcgtaacaccaaagcaattgtc 
aaaacgattggtgtcacatgtccgaagtgtaatgaaggagatgtcgtagaacgtaaatca 

30 aagaaaaatagaattttctatggttgttctagatatccagaatgtgattttatttct tgg 
gataaacctgttggaagagattgtcctaagtgtcatcattaccttgtgaacaagaaaaaa 
ggtaaaagtagtcaagttgtgtgctccaactgtgattatgaagaagaagttcaaaaatag 

35 Sequence 1596 

MGHVRDLPRSQMGVDTEDNYEPKYITIRGKGPVVKDLKKHAKKAKKI FLASDPDREGEAI 
AWHLSKILELEDSKENRVVFNEITKDAVKDSFKHPRGIEMDLVDAQQARRILDRLVGYNI 
SPVLWKKVKKGLSAGRVQSVALRLVIDRENEIRNFKPEEYWSIEGEFRYKKSKFTAKFLH 
YKNKPYKLNNKDDVQRITEALNGDQFEITNVNRKEKTRYPAHPFTTSTLQQEAARKLNFK 

40 ARKTMMLAQQLYEGIDLKRQGTVGLITYMRTDSTRISTSAKSEAQQYINDKYGEQYVSQR 
KSSGKQGDQDAHEAIRPTSTMRTPDDMKAFLTRDQHRLYKLIWERFVASQMAPAILDTVA 
LDVTQNDIKFRANGQTIKFKGFMTLYVEAKDDKENDKENKLPQLDKGDKVTATKIEPAQH 
FTQPPPRYTEARLVKTLEELKIGRPSTYAPTIDTIQKRNYVKLESKRFI PTELGEIVYEQ 
VKEYFPEIIDVEFTVNMETLLDKIAEGDMNWRECVIGDFYNSFKQDVERAESEMEKIEIKD 

45 EPAGEDCEVCGSPMVIKMGRYGKFMACSNFPDCRNTKAIVKTIGVTCPKCNEGDWERKS 
KKNRIFYGCSRYPECDFISWDKPVGRDCPKCHHYLVNKKKGKSSQWCSNCDYEEEVQK* 



Sequence 1597 
50 Cont ig_0 62 3_pos_3 50 9_2 1 7 8 , 

is similar to (with p-value 0.0e+00) 

>gp:gp| Z99112 | BSUB0009_83 Bacillus subtilis complete genome 
(section 9 of 21): from 1598421 to 1807200. NID : g2633902. > 
gp:gp|AJ000975|BSYLQGCOD_7 Bacillus subtilis ylqg to codV ge 
55 ne region. NID: g2462964. 

atgagtgattttaggaggggcaaaatgacgcaaaaagtaaacgttgtaggagctggttta 
gctggctctgaagctgcatatcaattagctcaacgtggaattaaagtgaatttaattgag 
atgcgtccagttaaacagacaccggcgcaccatacagataaatttgctgaattggtatgt 
tcaaattcattgagaggtaatgcacttacaaatgctgttggtgttcttaaagaggaaatg 
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agacatttagactcgttaattatcacatcagcagataaagcacgtgtgccagcgggtggt 
gctttagcagtggatagacatgattttgctggctatattacagataccttaagaaaccac 
cctaacatcactgtattaaatgaagaagttaatcatataccagaaggttatacgattatt 
gcaactggccctctaactactgagcatttagctcaagaaattgttgatattactggtaaa 
5 gatcaattgtatttttacgatgctgccgcaccaataatagaaaaagattcaattaatatg 
gataaagtatatttgaaatcacgttatgataaaggtgaagcagcgtatcttaattgtcct 
atgactgaagaagagtttaaccggttttatgatgcagtattagaagctgaagttgcacca 
gtcaatgagtttgaaaaagaaaaatattttgaagggtgtatgccttttgaagtcatggct 
gaaagagggcgaaaaactttgttatttggtccgatgaaacctgttggacttgaagatcct 

10 aagactgggaaacgcccttatgcagttgttcaattaagacaagatgatgcagctggaaca 
ttatataatattgttggctttcaaacacatttaaaatggggtgcgcaaaaagaagtcatt 
cgtttaattccaggattagaaaatgttgatattgtaagatatggtgtgatgcaccgaaat 
acctttattaattcacctgatgttttaaacgaaaaatatgaattaaaaggacatgataat 
ttatattttgctggacaaatgactggcgttgaaggttatgttgaaagtgctgccagtgga 

15 ttagttgcaggtattaatcttgcgcataaaattttagacaaaggggaagttattttccct 
agagagacaatgataggtagtatggcttactacatatcacatgccaaaaatgagaagaat 
tttcaacctatgaatgccaattttggtcttttaccatctctcgaaaaacgtattaaagat 
aaaaaagaaagatatgaaacacaagccaaaagagcgttagagtatttagataattacaaa 
caaacgctgtaa 

20 

Sequence 1598 

MSDFRRGKMTQKVNVVGAGLAGSEAAYQLAQRGIKVNLIEMRPVKQTPAHHTDKFAELVC 
SNSLRGNALTNAVGVLKEEMRHLDSLI ITSADKARVPAGGALAVDRHDFAGYITDTLRNH 
PNITVLNEEVNHI PEGYTIIATGPLTTEHLAQEIVDITGKDQLYEYDAAAPIIEKDSINM 
25 DKVYLKSRYDKGEAAYLNCPMTEEEFNREYDAVLEAEVAPVNEFEKEKYFEGCMPFEVMA 
ERGRKTLLFGPMKPVGLEDPKTGKRPYAVVQLRQDDAAGTLYNIVGFQTHLKWGAQKEVI 
RLIPGLENVDIVRYGVMHRNTFINSPDVLNEKYELKGHDNLYFAGQMTGVEGYVES7VASG 
LVAG I NL AH KI L DKG E V I FPRETMI G S MA Y Y I SHAKNEKN FQPMNANFGLLPSLEKRI KD 
KKERYETQAKRALEYLDNYKQTL* 

30 

Sequence 1599 

Contig_0623_pos_l 923 967 , 

is similar to (with p-value 4.0e-47) 

>sp:sp| P3977 6|CODV_BACSU PROBABLE INTEGRASE/RECOMBINASE CODV 

35 . >gp:gp|U13634 |BSU13634_2 Bacillus subtilis JH642 dipeptide 
permease operon regulators, codV, codW, codX, and codY gene 
s, complete cds. NID: g535347. >gp: gp | Z99112 I BSUB0009_84 Bac 
illus subtilis complete genome (section 9 of 21): from 15984 
21 to 1807200. NID: g2633902. 

40 atgttaaaggttgaaagaaacttttcagagtatacgttaaaatcttatcatgatgattta 
gttcaatttaacaactttttagaaagagaacatttacaacttgagacttttgaatataaa 
gatgctagaaactatttggcttttttatattctaatcaattaaaaagaactacggtgtca 
agaaagatatcaactttacgtaccttctatgaattttggatgactcaagataattcaatt 
attaatccctttgttcaactagtgcatcctaaaaaagagaagtatttacctcaattcttt 

45 tatgaagaagaaatggaagcactttttcaaactgtagagcatgataataaaaaaggcata 
cgagacaaagttattattgaattgttatatgcaacaggaatacgtgtgtctgaattaata 
aatattaaactaaaagatatagatatgaacttaccaggtgtaaaagttttaggtaaagga 
aataaggaaaggtttatcccttttggagagttctgtagacagagtatagaaagatactta 
gaagaattccaacctaaacaattagccaatcatgattatttaattgtaaatatgaaaggt 

50 gatcctatcaccgaaagaggagtaagatatgtact taatgatgtcgttaaaagaaccgct 
ggcgtcaatgacatacatcctctatttatgattatagcgacgataggtgaaatggtatat 
tctccaattcttgaagaaaatcgttttaaaatggttccttctcataaaagagggacatat 
tcagcagtgcatgctttaggatttaacctagctgaattacttgcaagatttggaattata 
ttaggagtgtttttaacttcaatggagatggggatctatatgtttgttttattattacta 

55 ggtggcatgtcactttacattgcagtgagtcgttttaataatacaaattcacaataa 

Sequence 1600 

MLKVERNFSEYTLKSYHDDLVQFNNFLEREHLQLETFEYKDARNYLAFLYSNQLKRTTVS 
RKISTLRTFYEFWMTQDNSIINPFVQLVHPKKEKYLPQFFYEEEMEALFQTVEHDNKKGI 
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RDKVIIELLYATGIRVSELINIKLKDIDMNLPGVKVLGKGNKERFIPFGEFCRQSIERYL 
EEFQPKQLANHDYLIVNMKGDPITERGVRYVLNDWKRTAGVNDIHPLFMIIATIGEMVY 
SPILEENRFKMVPSHKRGTYSAVHALGFNLAELLARFGIILGVFLTSMEMGIYMFVLLLL 
GGMSLYIAVSRFNNTNSQ* 

5 

Sequence 1601 

Contig_0623_pos_0_694 , 

putative peptide of unknown function 

gtgacatactgggtaaagttaagtcgcgacattgagttacgtagattaatgtatgcatta 
10 ttagatgtcgttcaaagtcaacctgtgttgcgtacacagtttgtgacagatgattttaat 
caactcaagataaatttaagagatttttttccatttattgaaattaaagaagttaatgaa 
atgtcgcaaagcatagatttagaagcattctttacacgtaatttaaattcctaccatttc 
aatcaattacctctgtttaattttaagatatatcaatttctagatggcgcctacctactt 
ttagatttccacgctactatttttaatgaaagtcaattaactccatttttacaacaatta 
15 aatattgcttatacccactctttaaaaagtgaatatagtatctcggatttttataattgg 
attaaagaaatgaatcaaaagatggatcaaaatcaagttgtgtgtccatcaaagcacttc 
aacgtattgaatgcagacggtgataattacgcttacatacctgtaaagaatacatctgaa 
aagaaaaaaatgtgttctttgcatgcagaactaccatctttagacattgatgcgtggatt 
gtaagtatttacttagcgcatcattttataagtcagtcttctgatgtgacgttaggcatc 
20 catttttcgatagataataaaaatactgagaata 

Sequence 1602 

VTYWVKLSRDIELRRLMYALLDVVQSQPVLRTQFVTDDFNQLKINLRDFFPFIEIKEVNE 
MSQS I DLEAFFTRNLNS YHFNQLPLFNFKI YQFLDGAYLLLDFHATI FNESQLTPFLQQL 
25 NIAYTHSLKSEYSI SDFYNWI KEMNQKMDQNQVVCPSKHFNVLNADGDNYAYI PVKNTSE 
KKKMCSLHAELPSLDI DAWI VS I YLAHH FI SQSSDVTLGI H FSI DNKNTENX 

Sequence 1603 

Contig_0625jpos_5 621_6001, 

30 putative peptide of unknown function 

gtgatttcatctactggttgttttgttattttttctgttggttcaccttcgccaactttt 
tcccctgttaatgggttcttagttgttggtgttgtaattgtttttgttcctggttcacct 
ttctgtttaacacgctcttcacctggttttaaatcaggattgaattcacgtttcttgtcg 
aatggaatttcttccgttgacgtaatcggatctccatcaactggaccatattttgtcaca 

35 tcatccactggtggtgtgactacttcgcctgtatcaggatttttaactcctggtttacct 
ggaacgtcctcttggctacctttcggtgcatttggatcaaattcatccttatggcctggc 
ttgatttcttcgccaccataa 

Sequence 1604 

40 VISSTGCFVIFSVGSPSPTFSPVNGFLWGVVIVFVPGSPFCLTRSSPGFKSGLNSRFLS 
NGISSVDVIGSPSTGPYFVTSSTGGVTTSPVSGFLTPGLPGTSSWLPFGAFGSNSSLWPG 
LISSPP* 

Sequence 1605 
45 Contig_0625_pos_6773J7 4 71, 

putative peptide of unknown function 

gtgatttcatctactggttgttttgttactttttctgttggttcaccttcgccaactttt 
tcccctgttaatgggttcttagttgttggtgttgtaattgtttttgttcctggttcacct 
ttctgtttaacgcgctctttacctggttttaaatcaggattgaattcacgtttcttgtcg 

50 aatggaatttcttccgttgacgtgatcggatctccatcaactggaccatattttgtcaca 
tcatccactggtggtgtgactacttcgcctgtatcagggtttttaactcctggtttacct 
ggaacgtcctcttggctacctttcggtgcatttggatcaaattcatccttatggcctggc 
ttgatttcttcgccaccatattctgtgatttcatctactggttgttttgttattttttct 
gttggttcaccttcgccaactttttcccctgttaatgggttcttagttgttggtgttgta 

55 attgtttttgttcctggttcacctttttgtttaacacgctcttcacctggttttaaatca 
ggattgaattcacgtttcttgtcgaatggaatttcttccgttgacgtgatcggatctcca 
tcaactggaccatattttgtcacatcatccacaggtggagtaactacttcgcctgtatca 
ggatttttaacccccggcttacctggttgcgttgtttga 



405 



WO 01/34809 



PCT/US00/30782 



Sequence 1606 

VISSTGCFVTFSVGSPSPTFSPVNGFLVVGVVIVFVPGSPFCLTRSLPGFKSGLNSRFLS 
NGISSVDVIGSPSTGPYFVTSSTGGVTTSPVSGFLTPGLPGTSSWLPFGAFGSNSSLWPG 
LISSPPYSVISSTGCFVIFSVGSPSPTFSPVNGFLVVGVVIVFVPGSPFCLTRSSPGFKS 
5 GLNSRFLSNGISSVDVIGSPSTGPYFVTSSTGGVTTSPVSGFLTPGLPGCVV* 

Sequence 1607 

Contig_0625_pos_8861_7 932 / 

is similar to (with p-value 0.0e+00) 

10 >sp:sp|Q05615|AROA_STAAU 3-PHOSPHOSHIKIMATE 1 - CARBOX Y VI N Y LTR 
ANSFERASE (EC 2.5.1.19) (5-ENOLPYRUVYLSHIKIMATE-3- PHOSPHATE 
SYNTHASE) (EPSP SYNTHASE). >gp : gp I L05004 j STAAROA_2 Staphyloc 
occus aureus dehydroquinate synthase (aroB) gene, 3' end cds 
; 3-phosphoshikimate-l-carboxyvinyltransferase (aroA) gene, 

15 complete cds; ORF3, complete cds. NID: gl52954. 

atggatagagtgatgaaacctctgttaaaaatgaatgctaatatatctggaattgataat 
aattacacaccacttataattaagccttcaactattaaaggtataaattatcaaatggaa 
gttgcaagtgcacaagttaaaagtgctattttacttgctagcctattttcaaaagaagcc 
acgacacttacagaatttgatgtaagtagaaatcatacagaaacattgtttgcacacttt 

20 aacattcctatatcaattcaaggtaaaacaatccaaaccataccttatgcaattgsacat 
atacagcctagagattttcatgttccaggagatatctcatctgctgcatttttcatagtc 
gcagccctaattacgcccggtagtgacattacaattcataatgttggcattaatcccact 
cgttcaggtatcatagatatcgttaaacaaatgggaggaaacattgaattaagtaatgta 
agcaagggtgcagaaccaactgcatcaatacatgtaaaatatacaccgaacttgaatgct 

25 gttacaattaaaggcgacttagttccaagggctatcgatgaattaccagttattgcacta 
ctttgcacacaagcttcaaattcttgtattatcaaaaacgcggaagaattaaaagtgaag 
gaaacaaatcgtatagatactactgcagacatgctaaatttacttggttttaatttacaa 
cccacacatgatggacttattatacatccatcagagtttagatcaaatgcaactgtagat 
agtcaaacagatcatcgtataggaatgatgttagctgtagcttcactacttagttcagaa 

30 ccactcaaaatagagcaatttgacgctgttaatgtttccttcccaggttttttacctaaa 
ctaaagcttttagaaaatgagggaaaataa 

Sequence 1608 

MDRVMKPLLKMNANISGIDNNYTPLIIKPSTIKGINYQMEVASAQVKSAILLASLFSKEA 
35 TTLTEFDVSRNHTETLFAHFNIPISIQGKTIQTIPYAIEHIQPRDFHVPGDISSAAFFIV 
AALITPGSDITIHNVGINPTRSGI IDIVKQMGGNIELSNVSKGAEPTASIHVKYTPNLNA 
VTIKGDLVPRAIDELPVIALLCTQASNSCI IKNAEELKVKETNRI DTTADMLNLLGFNLQ 
PTHDGLIIHPSEFRSNATVDSQTDHRIGMMLAVASLLSSEPLKIEQFDAVNVSFPGFLPK 
LKLLENEGK* 

40 

Sequence 1609 

Contig_0627_pos_595_1704, 

putative peptide of unknown function 

atggtaaacagtaatgatattgtttctattgttattagtgatattacacgtccaacgccc 
45 aaccatattcttgtacctttactaattgaggaattaaatcatgttcctcgtgagaatttc 
gtaattattaatggtacagggactcatcgagatcaaacgcgagatgaattgattcaaatg 
ttaggtgaagatattgtaaattcagtaaaaatcgttcacaatcattgctcagaaaaagaa 
agtctagctaaagtgggacacagtcaatatggatgtgatgtttatttaaacaaagcatat 
gtagaatccgattttaaaattgtaacaggttttattgaaccacactttttcgccggattt 
50 tcaggtggacctaaagggataatgcctggaattgcaggtttagaaacaattcaaacattt 
cataatgcaaaaatgattggcgatccgagatcaacgtggggaaatttagaagacaatcca 
gttcaagatatggcacgggaagttaaccgtatgtgtaaacctgactttttacttaatgtt 
gcattgaataaaagtaaagaaatt act gcagcatttgctggtgaaatcttagat a cacac 
aaagaaggatgcgcatatgtaaaagatcatgcaatgtttaaatgtgagcaacgctttgat 
55 attgttatcgcatcaaattctggctatcctttagatcaaaatttatatcaaacagttaaa 
gggatgagtgcagcgagtaaagttgttaaaaaagacggtcatattattatggtatctgag 
tgtgcagatggctttcctgatcatggtaagtttgccgaaattttcaaaatggcagacaca 
cctcaaggtattttagaacttattcacaatccaaactttaaggaagttgaccaatggcaa 
gtacaaaaacaagcaagtattcaaacttttgccaatgtgcatgtttattcagaacttact 
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gaccaacaacttaaagactcgatgttaatcccaacctctaacattgaacatacaatacaa 
gaattagaacatcgatatggccgtaaattaaccattggtgttatgccacaaggtccttta 
acaataccgtacgtagaagataaagaataa 

5 Sequence 1610 

MVNSNDIVSI VISDITRPTPNHILVPLLIEELNHVPRENFVIINGTGTHRDQTRDELIQM 
LGEDIVNSVKIVHNHCSEKESLAKVGHSQYGCDVYLNKAYVESDFKI VTGFIEPHFFAGF 
SGGPKGIMPGIAGLETIQTFHNAKMIGDPRSTWGNLEDNPVQDMAREVNRMCKPDFLLNV 
ALNKSKEITAAFAGEILDTHKEGCAYVKDHAMFKCEQRFDIVIASNSGYPLDQNLYQTVK 
10 GMSAASKVVKKDGHIIMVSECADGFPDHGKFAEIFKMADTPQGILELIHNPNFKEVDQWQ 
VQKQASIQTFANVHVYSELTDQQLKDSMLIPTSNIEHTIQELEHRYGRKLTIGVMPQGPL 
TIPYVEDKE* 

Sequence 1611 
15 Contig_0627jpos_17 4 8_2551, 

is similar to (with p-value l.Oe-44) 

>sp:sp| P73846! YH17_SYNY3 HYPOTHETICAL 30.2 KD PROTEIN SLR171 
7. >gp:gp|D909101D90910_32 Synechocys tis sp. PCC6803 complet 
e genome, 12/27, 1430419-1576592. NID: gl652956. 

20 atgaaattagacgcactattgaaagacatgcagagtgtagtaattgccttctcaggtgga 
gtagatagtagcttgttactgaaaaaagcgattgatattttaggtgttaactatgttaaa 
cctgttgtagtaaaatcagaattatttagaaatgaagagtttgaactagcgcttaaactt 
ggacaaagtctaggtgttgaagtattagaaactgaaatgtctgaacttcaagatgcgaat 
atcgttaaaaatacgcctgaaagttggtactatagcaagcgcttgatgtatagtcaactt 

25 gagaatattaagaataaactaggatttaattatgtgctagatggtatgattatggatgac 
ttagatgattttcgtcccggattaaaagcaagagacgactttggtgttcgtagcgtttta 
caagaagcaaaactatataaatctgaagttagagaattaagtcatcaacatgacttgcct 
gtatggaataaaccagccttatgtagtctagcatcaagaataccttatggtgaggaatta 
agttttacaaaagttaacaaggtcaacgaagcagaaaaattcattttaagcctaggtatt 

30 aaccacgtacgagtacgctatcatcacaacatagcacgcattgaagtaacagaagatcaa 
ttaaataatcttcttaaattgaaagacagtatcatattacatttgaaagaattaggattt 
gactatgtaacaatggatttagaaggctatcgtacaggtagtatgaatgaaatcattgat 
accaaatctacaagttttaaataa 

35 Sequence 1612 

MKLDALLKDMQSVVIAFSGGVDSSLLLKKAIDILGVNYVKPVVVKSELFRNEEFELALKL 
GQSLGVEVLETEMSELQDANIVKNTPESWYYSKRLMYSQLENIKNKLGFNYVLDGMIMDD 
LDDFRPGLKARDDFGVRSVLQEAKLYKSEVRELSHQHDLPVWNKPALCSLASRIPYGEEL 
SFTKVNKVNEAEKFILSLGINHVRVRYHHNIARIEVTEDQLNNLLKLKDSIILHLKELGF 

40 DYVTMDLEGYRTGSMNEIIDTKSTSFK* 

Sequence 1613 

Con t i g_0 6 2 7_po s_2 5 6 6_3 342, 
is similar to (with p-value 8.0e-40) 
45 >sp:sp|Q57629| Y165_METJA HYPOTHETICAL PROTEIN MJ0165. >gp:gp 
I U67473 | U67473_9 Methanococcus jannaschii section 15 of 150 
of the complete genome. NID: g2826256. 

atgagccatagttataattctatagaagaggtgctcaaagctgtaaaatcaaatcaacta 
tctattaatgatgctaaagcccaactcagtcattatgacgaattgggctttgctaaaatt 

50 gacttacatagagcacagcgtcaaggatttcccgaagttatctttgggcaaggaaaaaca 
aaagaacaaatcactaaaatcatctctagtttgatatttcataatgaagttattctagtg 
acacgtgttgatgaaatgaaagcaaaatacattttacaacattatccaaacttggaatat 
catcaaactgcacagttaattagcactccactaaaagatataccacaatctaaatactat 
gtttctgtactttgtgctggaacttctgatttacctattgcagaagaagctgcattaacc 

55 gctgaaatcatgggagtaagtgtaaaacgattttatgatgtcggggtttcaggtattcat 
cgcttattatccaacattcatgatatacgcagagggaaagtttctatcgttatagctgga 
atggaaggcgctttagcaagtgttgttggaggattagtcaaccaccctgtatatgcagta 
ccaacgagtgtaggttatggagcaaacttgaatggggttaccaccctattatcaatgata 
aatagttgcgcacccggaaccagcgtattaaatatcaataatggatttggtggcggttac 
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aacgctgcacagattattcatatgctagaaaataaagagagtgaggtatctttatga 
Sequence 1614 

MSHSYNSIEEVLKAVKSNQLSINDAKAQLSHYDELGFAKIDLHRAQRQGFPEVIFGQGKT 
5 KEQITKIISSLIFHNEVILVTRVDEMKAKYILQHYPNLEYHQTAQLISTPLKDIPQSKYY 
VSVLCAGTSDLPIAEEAALTAEIMGVSVKRFYDVGVSGIHRLLSNI HDIRRGKVS I VIAG 
MEGALASVVGGLVNHPVYAVPTSVGYGANLNGVTTLLSMINSCAPGTSVLNINNGFGGGY 
NAAQIIHMLENKESEVSL* 

10 Sequence 1615 

Contig_0627_pos_3387_3701, 

putative peptide of unknown function 

atgctact ttctgctttagttgatttaggagcaaaccctgaagacattgaatcagaacta 
aaaaaattacctttagatcaatttaagctacattttcaaaaaagagtaaaacaaggtatt 
15 catgcaatgacattaaacattgatgttaaagaagcaaatcatcatcgtcacgttaatgat 
atatttaaaatgatagatgacagtacacttccggaaagggttaaatatcgcagtaagaaa 
atttttgaaatcattggtcaagcagaagctaaaattcatggcatgtcttcaagcatgttg 
ttaactttactttga 

20 Sequence 1616 

MLLSALVDLGANPEDIESELKKLPLDQFKLHFQKRVKQGIHAMTLNIDVKEANHHRHVND 
IFKMIDDSTLPERVKYRSKKI FEIIGQAEAKIHGMSSSMLLTLL*. 

Sequence 1617 
25 Contig_0627_pos_5833_7239, 

putative peptide of unknown function 

atggcaattctaacaccattatcaactacggattggcatgcatataaagttaatctaagt 
caatatttgactcaagaaaatggtcgttatttaggacatttatttgaatgggttgccgta 
cataataacataataagagctttaatatatgcgataacttcgtttttagttatctattta 

30 gttgcttatatggttcaattacatacgaatcgtatttattttattttgagttttgtgtta 
atggttactgtacctaatacaatttatagcgaaacttacgggtggtttactggatttttt 
agttatatacctgctacagtcctatcactttttattctttttacggtagttaaaaagatt 
gagtcgcacgatacagtttctgaaatgcaattatgggtatttttattagtaagtttgttt 
ggacaattcttcttggagaatctttccatcgctaatagcttaattattttaataggaatg 

35 gtagtctatttctttgttaaaaaaagactcagttatttcttaattgtaggatttatgctt 
agttgtataggtaacattataatgtttttaaacttcaattattttttaattaaggatgga 
ttaaatacgcattattcaatttccgatagtcatggaatgatacataaagcaggtgtgacg 
ttatttaagcttgtaccagaatatatgtttattaatcaaatgattattcttaccgtgata 
tcaatagtaagtatagttttacttaagcaaaataaaagcctgaagcatatgagagtttat 

40 attaaaataccactactcttaggtttaattactttacctatttataagatcttcgtttac 
aatcaatt teat tttgaattatataaagctt cat tttctatagccgttttgaatacaacg 
atttgcttcatttacatgataagtgtgatatacgttgtgtttaaaatgatacagcaaaga 
tacataagaatgattgtgatggggagttttatagctatggcttcatctgttttgccactt 
ttatttgtgacgcctataagttatagaaatttttattttatttatactttatggatcgtg 

45 atattactttgtttaattcagcaatgtgatgtgctatttaaacaacttgaacatataatt 
aaaatatttgcgattatcatcagcatcattatgatgattggatttacttttatacatatt 
agtagtgtgcacagaatagacttcattaaagaacaaataacacaacatcatcgctatcag 
aaaataacattggaaagattaccatttgagcgatatactcatatgactacaccaaagtcg 
aaggaacaacttcaagatttcaaacactattatgatttgcccaaagacatcacatttaaa 

50 gtagtcccatatggtacaaaacaataa 

Sequence 1618 

MAILTPLSTTDWHAYKVNLSQYLTQENGRYLGHLFEWVAVHNNIIRALIYAITSFLVIYL 
VAYMVQLHTNRI YFILSFVLMVTVPNTI YSETYGWFTGFFSYIPATVLSLFILFTVVKKI 
55 ESHDTVSEMQLWVFLLVSLFGQFFLENLSIANSLIILIGMVVYFFVKKRLSYFLIVGFMI, 
SCIGNIIMFLN FNYFLIKDGLNTHYSISDSHGMIHKAGVTLFKLVPEYMFINQMIILTVI 
SIVSIVLLKQNKSLKHMRVYIKIPLLLGLITLPIYKIFVYNQFHFELYKASFSIAVLNTT 
ICFI YMISVI YVVFKMIQQRYIRMIVMGSFIAMASSVLPLLFVTPISYRNFYFI YTLWIV 
ILLCLIQQCDVLFKQLEHIIKIFAIIISIIMMIGFTFIHISSVHRIDFIKEQITQHHRYQ 
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KITLERLPFERYTHMTTPKSKEQLQDFKHYYDLPKDITFKVVPYGTKQ* 

Sequence 1619 
Contig_0627_pos_7260_764 6, 
5 putative peptide of unknown function 

atgaagttaacccgaatacattatgagattataaagtttatcatagttggtggaattaat 
acctttaactactatataacatacttatttttgttaaaggtgttacatgtgaattatatg 
gttagtcacattgttggatttattgtaagttttattatttcatattatttaaattgttat • 
tttgtatataaagtaaaacctacaatagaaaagtttttaagatttcctatcactcagata 
10 gttaatatggtaatgcaaacgttattattatatatattcgtaaagtggttgaatatcgct 
tcagaaattgcaccttttgcgggtctaatcattacaatcccagtgacattcatact ttct 
aagtggttacttagagataaagtttaa 

Sequence 1620 

15 MKLTRIHYEIIKFIIVGGINTFNYYITYLFLLKVLHVNYMVSHIVGFIVSFIISYYLNCY 
FVYKVKPTIEKFLRFPITQIVNMVMQTLLLYIFVKWLNIASEIAPFAGLIITIPVTFILS 
KWLLRDKV* 

Sequence 1621 
20 Cont ig_0 627_pos_4 306_3668 , 

putative peptide of unknown function 

atgaaaagaacagataaatatagagattcatacaaatatgatgaccaatatcaaaatcat 
cgtaaacgttcagaagaagatatgtatcgacaacatcaagagtcccaacagagagcaaat 
tcaaatcgtgcaacacaaagtgaaaatgatagagagtatgaaaatcatcctgaacgtta t 

25 tacaatggaagagactatcgacgtgagcagcaattggaagaagaaaatgaaaaatcaagc 
aaaactaaaaaatggctgattgcaatcatagttattttactcattattgtagctatcttt 
atcacacgtgcaattatcaatcataataatgataaagtaagtaatgaccctaacgt I tea 
caaaactataaaaaagaagttgaaaatcaaaacgacgacattaatcgacaagttgattca 
gccaaaagcgatataaaaaataaaaaggacacccaatcccaaattgataaactacaaaat 

30 caaattgatcaattaaaacaaaatgaagaaactaatgcggattctaaattcacaaaattt 
tatcaaaaccaaatcgacaaactgaaaaatgcaaataacgctcaacttaataacgaaaat 
caaagtaaagttaacaacatgcttgaagacatgccatga 

Sequence 1622 

35 MKRTDKYRDSYKYDDQYQNHRKRSEEDMYRQHQESQQRANSNRATQSENDREYENHPERY 
YNGRDYRREQQLEEENEKSSKTKKWLIAIIVILLIIVAIFITRAIINHNNDKVSNDPNVS 
QNYKKEVENQNDDINRQVDSAKSDIKNKKDTQSQIDKLQNQIDQLKQNEETNADSKFTKF 
YQNQIDKLKNANNAQLNNENQSKVNNMLEDMP* 

40 Sequence 1623 

Con t i g_0 6 2 9_pos_3 8 6 4 _2 3 7 1 , 

putative peptide of unknown function 

atgatttacatgcgtgtagcaataataggtatgggaacagctggtgtaagtgtgttacgc 
caactcgttaaacatgaaaacttttctcaattaaaagtagatgtatatgacgatgttaga 

45 aatatgggccaaggtgttccatttcaaaatgatagtagcgaactacttattaacatgcca 
tcaaaatccatgagcttaaatcttgatgatgatcaagagttttggaagtggtatcaaaat 
cagacggaatttaattttagtaatcctcaatatttgcctagatttgtatttggtcattat 
atgaagtcttatttatcttattataatgaccaatttgataatttaactattatcaatgat 
aaagtacaagaaatttttacacaatccgatgttgatgacacagatttaaaatatcatgta 

50 tgtacatgtgatgatgaaaaagaatggcgtgaatacgattatttatttttaacttttgga 
acttttagttaccatgacccttatgatttgaagggaactaaaggctatatacaaacgccg 
tatcctacatatcatacacttgataatgttaaagattcagatcgaatcgtgattattgga 
acagggttggctagtttagatgctgtgagatatgtagctgcacatcatccatctttaccc 
attactatgacaagtcgttctgcagcattgccaagtgttagagggaaaatgactaaaatt 

55 cagtttacgcatttaactaaatcacgatttaatggaattatgaaaaatcactttggtaat 
gtaccattagaaaaagtagtttcattatttttaaaagaatgtgaagattatggaatagat 
tttaaaaaacttatttatcgtagaaccggaaaccatgtcaaagacttggagtatgattta 
aatcatgaagaagaaatggggatattccaaagtatcattgaacatttaaaagaaaatt La 
aactggatttggaatagt ttgagcgttaaagatcaagaaacttttaatcgtaaatacact 
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aaaattattcagttaaattctaatccaatgcctcctagaacagctcgtttacttatcaag 
ttaatacagaataatgaacttgtcattaaaaaagggctagaagacatagtccataaaaat 
aatcaatttatgttgaagtataacgacactacgcaaaattatgagttgtttgacatcgtt 
attaatgcaacgggctctaaaacacatctttctcaattagatgaggatgatcaattaatt 
5 ttaaacttagaaaatagacaaattgttcaacgtcatcctatgggtggcattcaaattatc 
ccagaaacaaatcaagtcataagccccagatatggaaccttaaaaaatgtgattgcaatt 
ggacaaatgaccaacggtgtcaataaacttagaaatggcgtaaagatgattgttaatcaa 
gttgttgatacagtatctcaattatatataacacaggaaaatagaaataagtaa 

10 Sequence 1624 

MIYMRVAIIGMGTAGVSVLRQLVKHENFSQLKVDVYDDVRNMGQGVPFQNDSSELLINMP 
SKSMSLNLDDDQEFWKWYQNQTEFNFSNPQYLPRFVFGHYMKSYLSYYNDQFDNLTIIND 
KVQEIFTQSDVDDTDLKYHVCTCDDEKEWREYDYLFLTFGTFSYHDPYDLKGTKGYIQTP 
YPTYHTLDNVKDSDRIVIIGTGLASLDAVRYVAAHHPSLPITMTSRSAALPSVRGKMTKI 

15 QFTHLTKSRFNGIMKNHFGNVPLEKVVSLFLKECEDYGIDFKKLI YRRTGNHVKDLEYDL 
NHEEEMGIFQSI IEHLKENLNWIWNSLSVKDQETFNRKYTKIIQLNSNPMPPRTARLLIK 
LIQNNELVIKKGLEDIVHKNNQFMLKYNDTTQNYELFDI VINATGSKTHLSQLDEDDQLI 
LNLENRQIVQRHPMGGIQI I PETNQVI SPRYGTLKNVI AIGQMTNGVNKLRNGVKMI VNQ 
WDTV5QLYITQENRNK* 

20 

Sequence 1625 

Contig_0629jpos_1922_8 91, 

is similar to (with p-value 3.0e-59) 

>sp:sp|P17618|RIBG_BACSU RIBOFLAVIN-SPECI FIC DEAMINASE (EC, 3 
25 .5.4.-). >pir:pir IS45543I PN0100 ribG protein - Bacillus subt 

ilis >gp:gpiL09228|BACDIA_10 Bacillus subtilis spoVA to serA 
region. NID: g410114. >gp : gp I X51510 | BSRIB_2 B. subtilis ribo 

flavin biosynthesis operon ribG, ribB, ribA, ribH, and ribT 

genes. NID: g40083. >gp : gp I Z9911 6 I BSUB0013_4 0 Bacillus subti 
30 lis complete genome (section 13 of 21) : from 2395261 to 2613 

730. NID: g2634723. 

atggatgatgctattcaactagcaaaaatggtaaatggacaaacaggtgttaatccacca 
gtaggatccgttgttgttaaaaacggtaggattgtaggtttaggtgcacatttaaaaaag 
ggagataaacatgccgaagtacaagctattgaaatggcaggtttaaatacccaaggtgct 

35 accatatacgtttcattagaaccttgcacacaccatggttcaacaccaccttgtgtgcat 
aaaatcattgaagcgggcatatctaaggtcatctatgctgttaaagatactactttagta 
agtaagggtgacgagattctgagagaagctggtatagaggttgaatttcaatataatgaa 
aatgcagctgcattataccgtgacttttttactgctaaaagaaacgaagttccagaagta 
actgtaaaggtctcatctagtctagatggtaaacaagcaacagactttaatgaaagtaag 

40 tggataacaaacaaagaagttaaagaagatgtttatcaattaagacatgagcatgatgca 
gt tat tact gggcgtagaaccattgaagcagacaatccattgtatacaaccagggttcct 
gatggaaagcatccgattcgagttattctttctaagaaaggtcaactcgattttaatcaa 
caaatatttaaagatactgcatcggagatatggatttacactgaaaatgaaaaattaaaa 
acaaataaaagttttattaaaataataaatattagtaattgtgatacaacgacaatatta 

45 caagacttatatcaaagaggtattgggaaactgctagtcgaggcaggcccaaatattaca 
tctcaatttctccaatccaaacatctaaatgaactcattttatatatagccccgaaatta 
attggtggttctggcaaacatcaattttataagactgacgaggtcattgatttgcctgaa 
gcaactcaatttgaaattgttgattccaagttaattaatcaaaatttaaaattgaaatta 
cgaaagaagtga 

50 

Sequence 1626 

MDDAIQLAKMVNGQTGVNPPVGSWVKNGRIVGLGAHLKKGDKHAEVQAIEMAGLNTQGA 
TIYVSLEPCTHHGSTPPCVHKI 1EAGISKVIYAVKDTTLVSKGDEILREAGIEVEFQYNE 
NAAALYRDFFTAKRNEVPEVTVKVSSSLDGKQATDFNESKWITNKEVKEDVYQLRHEN DA 
55 VITGRRTIEADNPLYTTRVPDGKHPIRVILSKKGQLDFNQQI FKDTASEIWI YTENEKLK 
TNKSFIKIINISNCDTTTILQDLYQRGIGKLLVEAGPNITSQFLQSKHLNELILYIAPKL 
IGGSGKHQFYKTDEVIDLPEATQFEIVDSKLINQNLKLKLRKK* 

Sequence 1627 
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Contig_0629_pos_890_252, 

is similar to (with p-value l.Oe-44) 

>sp:sp|P!64 40|RlSA_BACSU RIBOFLAVIN SYNTHASE ALPHA CHAIN (EC 
2.5.1.9). >pir:pir|S45544 IA35711 riboflavin synthase (EC 2. 
5 5.1.9) alpha chain - Bacillus subtilis >gp : gp | L09228 fBACDIA__ 
11 Bacillus subtilis spoVA to serA region. NID: g410114. >gp 
:gp|X51510 I BSRIB_3 B. subtilis riboflavin biosynthesis operon 
ribG, ribB, ribA, ribH, and ribT genes. NID: g40083. >gp:gp 
|Z99116|BSUB0013_39 Bacillus subtilis complete genome (secti 

10 on 13 of 21): from 2395261 to 2613730. NID: g2634723. 

atgtctatgtttacaggtatcattgaagaaataggtactgtacaacaagttcgctctgaa 
caatcagtaagaacgcttgaaattaaagcacaaaacattttagttgatatgcatattggt 
gattcaataagtgttaacggtgcatgtttaactgtgatagatttcactgactcaagtttt 
tcagttcaagtcatcaaagggactgaaaacaaaacatatcttggaagtgttcaacgtaat 

15 acagaagttaatctcgaaagagccatgagtggaagtgggagatttggtggacatttcgtg 
ttaggtcatgttgatgagcttggaacaatttctaaaatcaatgaaactgctaactcaaaa 
attatttctattaaaacaactaaaaatattttgaatcaaatggtaaagcaaggttctata 
actgtagacggagttagtcttactgtatttgatttacatgattatacttttgatatacat 
cttataccagaaacacgtcgatctactattctttcatctaaaaaagtgggcgacaaagtg 

20 cact tgga gt ct gacgt acta ttcaaatatgttgaaaacatcatgaatcaaaatcaa teg 
cagttaacagaagaaaagcttagagcatttggtttttag 

Sequence 1628 

MSMFTGIIEEIGTVQQVRSEQSVRTLEIKAQNILVDMHIGDSISVNGACLTVIDFTDSSF 
25 SVQVIKGTENKTYLGSVQRNTEVNLERAMSGSGRFGGHFVLGHVDELGTISKINETANSK 
IISIKTTKNILNQMVKQGSITVDGVSLTVFDLHDYTFDIHLI PETRRSTILSSKKVGDKV 
HLESDVLFKYVENIMNQNQSQLTEEKLRAFGF+ 

Sequence 1629 
30 Contig_0630_pos__4 521_0, 

putative peptide of unknown function 

gtgtgcatattagtatttaagcacagaaaagaacaattagcaggattgaaattttctatc 
agtttaaaagtgattgagcgtctacttttagcactcattctaccacttatcattttaatg 
attggcttgtttagctttaatacttatgctgatagtttcatcctattacaaacttcagat 
35 ttatcagtatcattattaactatattaattggtcatattttaatggcttttgtagtggag 
tttggtttccgttcttacttacaaaatatt cttgaaacaagaatgaacacattttttgcg 
agtattgtcgttggtct tatttattcagtattt 

Sequence 1630 

40 VCILVFKHRKEQLAGLKFSISLKVIERLLLALILPLI ILMIGLFSFNTYADSFILLQTSD 
LSVSLLT I LIGH I LMAFWEFG FRS YLQN I LETRMNTFFAS I WGLI YSVF 

Sequence 1631 

Con t i g_0 6 3 0_po s_2 7 0 6_ 1 5 8 8 , 

45 is similar to (with p-value 2.0e-83) 

>sp:sp| P54 955I YXEP_BACSU HYPOTHETICAL 41.6 KD PROTEIN IN IDH 
-DEOR INTERGENIC REGION. >gp : gp | Z99124 | BSUB002151 Bacillus 
subtilis complete genome (section 21 of 21) : from 3999281 to 
4214814. NID: g2636442. >gp : gp | D45912 I D45912_19 Bacillus su 

50 btilis genome sequence between the iol and hut operon, parti 
al and complete cds . NID: gl408482. 

atgacaaattattctacttatgtagattggagaagaacgtttcatcaatatcctgaactt 
tcagatgaagaatatgaaactacagaaaagttacgaaaaatactcaaaagttatggtata 
cgtatactggaggtacctttaaaaacaggtttagtagcagaaattgggcaaggagaggaa 
•55 atgatagcagtaagaacagatattgatgctttgcctatagaagaacaagtgaagcatgaa 
tttacatcaaagtatcaaggtgcaatgcatgcatgtggtcatgatattcatatggcaagt 
atattagctactggtattcaactaaaagagattgaagatgaattaaatggacgcgttcga 
ttaatatttcaacctgctgaagaattaggacatggtgcatttgaaatcataaatactgga 
gtacttaaaggagctaaagcag tact tggttt tea taattatcccactttaaaagttggt 
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gaatttgctattaaatcgggtgcaattacctctgctgtcgatcgttttgagtttaatgtt 
aaaggtaaaggtgcgcatgctgcaaaacctgagcaaggaaatgatccagtcatcgtcgta 
ggacaacttatcaatagtttacaaactattgtgagtcgaaatttatcagcttttgatagt 
gcagttgtaacaatcggtgaaatttcttgtggtaacacatggaatgttatagctgacaaa 
5 gcttatatacagggcactgttcgttcattcgatgaggatatacgtcattatattgaaaat 
aggatgaaaaatattgctgatggtttaagtcgtgtttttaatgtggatattgatttaact 
tattcaagactacctggtgcagtagtaaatgatgcacatctaacacaagaagcaatcgag 
gtcgctaaaaatgttggctatcatgtatcaatgctcgatgaaccggttactattggagag 
gatttttcaggttatacagaagaataccccggtgttttcgcatttattggctccgacagt 
10 aaatatgatttacatcatcctaaatatcatccagatgagcgtattttggaaaaagttcct 
caatatttcgttcagctcgttcaacgtttattgacataa 

Sequence 1632 

MTNYSTYVDWRRTFHQY PELSDEEYETTEKLRKILKSYGIRILEVPLKTGLVAEIGQGEE 
15 MIAVRTDIDALPIEEQVKHEFTSKYQGAMHACGHDIHMASILATGIQLKEIEDELNGRVR 
LIFQPAEELGHGAFEIINTGVLKGAKAVLGFHNYPTLKVGEFAIKSGAITSAVDRFEFNV 
KGKGAHAAKPEQGNDPVIWGQLINSLQTIVSRNLSAFDSAVVTIGEISCGNTWNVIADK 
AYIQGTVRSFDEDIRHYIENRMKNIADGLSRVFNVDIDLTYSRLPGAWNDAHLTQEAIE 
VAKNVGYHVSMLDEPVTIGEDFSGYTEEYPGVFAFIGSDSKYDLHHPKYHPDERILEKVP 
20 QY FVQLVQRLLT * 

Sequence 1633 

Cont ig_0 630_pos_l 4 91_6 10 , 

putative peptide of unknown function 

25 atgtcagctcaagatccgcgcaataaatttaaaactgataattatgaaaaacaagaacaa 
gaagttccaggtatacaagctaaaatgtcaccacaaccagattgtggggaagattcttat 
catggccaccatcgattagatggctttaaaatactagtgactggtggcgattcagcaatt 
ggacgtgcggcagcaattgcttatgctaaagaaggtgcagatgtagcgattaattattta 
ccaagtgaacaacaagatgccgatgatgtaaaacagattattgaaaatgttgggcaaaaa 

30 gctatcttaattcctggtgatattagagatgaacaat tcaactatgacatggttgaaaag 
gcttatcagcaattaggtggtt tagacaatgtaacgttggttgctggtcatcaactttat 
caagatgaattatcggagtttaaaactcaagattttaccgaaacgtttgaaacgaatgtc 
tatccggtattttggacagtccaaaaagcgcttgagtatttacaaccaggaggctcgatt 
acaacaacatcttcagttcaaggttataatcctagtccaattcttcatgattatgctgca 

35 acgaaagctgcaattatatctttaacaaagagtttttcagccgaacttggccctaaaggt 
attcgtgttaactgtgttgcacctggaccgttttggtcaccacttcaaattgtcggtgga 
caaccacaaagcgctatacctacttttggacaaaacacaccgttaggacgtgccggccag 
ccagttgaatgtgctgggacatatgtgttattagcctctgatgatgcaagttatattacc 
ggtcaagtatatggtgtgactggtgggactcaaatagattaa 

40 

Sequence 1634 

MSAQDPRNKFKTDNYEKQEQEVPGIQAKMSPQPDCGEDSYHGHHRLDGFKILVTGGDSAI 
GRAAAI AYAKEGADVAI NYLPSEQQDADDVKQI I ENVGQKAI LI PGDI RDEQFN Y DMVEK 
AYQQLGGLDNVTLVAGHQLYQDELSEFKTQDFTETFETNVYPVFWTVQKALEYLQPGGSI 
45 TTTSSVQGYNPSPILHDYAATKAAIISLTKSFSAELGPKGIRVNCVAPGPFWSPLQIVGG 
QPQS AI PT FGQNT PLGRAGQP VECAG T Y VLLAS DDAS Y I TGQVYG VTGGTQI D* 

Sequence 1635 

Cont i g_0 63 O_pos_0_32 3 , 

50 is similar to (with p-value 7.0e-29) 

>gp:gpl Z99122 |BSUB0019_105 Bacillus subtilis complete genome 
(section 19 of 21): from 3597091 to 3809700. MID: g2636029. 
>gp:gp| Z93767 |BSZ93767_8 B. subtilis DNA; 15.2 Jcb fragment, 
from ywqN gene to ywrO gene. NID: gl929325. 

55 gtgaaaaaaataaaagctgaaacactttcagatatgcaaaattataaattacttagtgga 
agtattatacctaggccgatagcatttgtaactactcaaaatttaaaaggggatatcaac 
gcagccccgtttagtttttttaatgtagttaatcatacaccacctatgattgcaattgct 
gttcaacgtacaaagggaaatagaaaagacacctcaataaatatagaacaatcaggtgag 
tttgtagtgcatattactgatgaggctattgttaatgatgtgaatgaaactgctgccccg 
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ttagaatatggtgttaatgaGTG 
Sequence 1636 

VKKIKAETLSDMQNYKLLSGSII PRPIAFVTTQNLKGDINAAPFSFFNVVNHTPPMIAIA 
5 VQRTKGNRKDTSINIEQSGEFVVHITDEAIVNDVNETAAPLEYGVNEX 

Sequence 1637 

Cont i g_0 63 3_pos_l 0 6 6_1 4 7 6 , 

putative peptide of unknown function 

10 atgcacaaatataataatcaaaaatataaagagggaaatacagtgcctgtgcatgttcaa 
cacatgattgctcaattcatgataagtgggaaatttgaaagacatttgaataaaatgcga 
aagatatatagagataaacttgattatattttaaaacgattaaagccctacaatactcaa 
ataaagattgaaggcgcactaactggaatgcattttacaataactgttaataatggattg 
tcaatgaaacaatgtttaaaaaatgcgaaaaaaaataatttaaaattaaaaccttatcat 

15 tacgaaaattattctaaagtttatccaaaatttattttaggatttggggggataaaaaaa 
gaagaattagaagatcatgttaatgcattaattcattcactcgttatataa 

Sequence 1638 

MHKYNNQKYKEGNTVPVHVQHMIAQFMISGKFERHLNKMRKI YRDKLDYILKRLKPYNTQ 
20 IKIEGALTGMHFTITVNNGLSMKQCLKNAKKNNLKLKPYH YENYSKVYPKFILG FGGIKK 
EELEDHVNALIHSLVI * 

Sequence 1639 
Contig_0634jpos_695_1129, 

25 is similar to (with p-value 5.0e-21) 

>sp:sp| P30267 | YKAA_BACFI HYPOTHETICAL 50.9 KD PROTEIN IN KAT 
A 3'REGION (ORF A) . >pir : pir | S274 91 I S274 9 1 hypothetical prot 
ein A - Bacillus firmus >gp:gp| L0254 8 | BACKATA2__1 B.firmus OR 
F A and ORF B, complete cds . NID: g!4 3118. 

30 gtgatgaaaaatatttggttgaatttcaaagaaagtctgattatgactatgaatatctta 
cccaccatattatcaataggtttaatttgcttgttactcgcagaatatacagtgattttc 
gattatttagcatatgttttttatccattaacttggatacttcaaataccagattccttt 
ttaactgcaaaaggcgcagctattggtataacagaaatgtttttgccttcattaattgta 
gtcgaagcaccattaatcactaaatttataattgctgttacttctgtttctacaattata 

35 ttcttttcagctagtgtgcctagtattctctctactgatatacccatccgcataagagat 
ttagtggttatatggtttgagagaactgtattgagtttaattatagtaacacctatcgca 
tatatttttttataa 

Sequence 164 0 

40 VMKNIWLNFKESLIMTMNILPTILSIGLICLLLAEYTVIFDYLAYVFYPLTWILQIPDSF 
LTAKGAAIGITEMFLPSLIVVEAPLITKFIIAVTSVSTI IFFSASVPSILSTDIPIRIRD 
LWIWFERTVLSLI IVTPIAY I FL* 

Sequence 1641 
45 Contig_0 634_pos_1506_1973, 

is similar to (with p-value 1.0e-76) 

>gp:gp I L23109 I STASINA_1 Staphylococcus aureus recombinase (s 
in) gene, complete cds. NID: g495088. 

gtgagaatgggagatcgttttgtcgtggaatcaattgatcgtttaggtcgtaattatgat 
50 gaagtcattaataccgttaattatcttaaagagaaagaagtacaattgatgattaccagt 
ctaccaatgatgaatgaagtcattggtaatccattattagataaatttatgaaagattta 
attatacagatattagcaatggtttcagaacaagaaagaaatgaaagtaaacgtcgacaa 
gctcaagggattcaagttgcgaaagagaaaggcgtatataaaggacgacctttattatat 
tctcccaatgcgaaagatcctcaaaaacgtgttatctatcatcgtgttgtagaaatgtta 
55 gaggaaggcaaagcgattagtaaaattgcgaaagaagtgaatattacaagacaaactgtt 
tatagaattaaacacgataacgatgatttgggttctgttgcaaagtaa 

Sequence 164 2 

VRMGDRFVVESI DRLGRNYDEVINTVNYLKEKEVQLMITSLPMMNEVIGNPLLDKFMKDL 



413 



WO 01/34809 



PCT/US00/30782 



IIQILAMVSEQERNESKRRQAQGIQVAKEKGVYKGRPLLYSPNAKDPQKRVI YHRVVEML 
EEGKAISKIAKEVNITRQTVYRIKHDNDDLGSVAK* 

Sequence 164 3 
5 Contig_0 634_pos_1281_862, 

putative peptide of unknown function 

atgattagggtgtaccctaaatatataattgagtttaaaataaaaaacagtaaattaaga 
tggaaatttcatatccttgctttccttatcactataaatatttgtaaaaaaggtgatttt 
tttatatacgatattttattattgaattataaaaaaatatatgcgataggtgttactata 
10 attaaactcaatacagttctctcaaaccatataaccactaaatctcttatgcggatgggt 
atatcagtagagagaatactaggcacactagctgaaaagaatataattgtagaaacagaa 
gtaacagcaattataaatttagtgattaatggtgcttcgactacaattaatgaaggcaaa 
aacatttctgttataccaatagctgcgccttttgcagttaaaaaggaatctggtatttga 

15 

Sequence 164 4 

MIRVYPKYIIEFKIKNSKLRWKFHILAFLITINICKKGDFFI YDILLLNYKKI YAIGVTI 
IKLNTVLSNHITTKSLMRMGISVERILGTLAEKNI IVETEVTAI INLVINGASTTINEGK 
NISVI PIAAPFAVKKESGI * 

20 

Sequence 164 5 

Contig_0635_pos_1358_1813, 

putative peptide of unknown function 

atggtaagaagaatagaagatcacatctcatttttagaaaaatttattaatgatgttaat 
25 acattaacggcaaagttacttaaagacttgcaaactgagtatggcatatcagctgagcaa 
tctcatgtgttaaatatgcttagtatagaggcgttaactgtggggcaaattacagagaaa 
caaggtgttaataaagctgctgttagtcgaagagtcaaaaagttgctcaatgctgaatta 
gttaaattagaaaaacctgattccaatactgaccaacgtcttaaaataattaaattatct 
aataaaggaaaaaaatatattaaagagagaaaagcgattatgagccatattgctagtgat 
30 atgacgagtgactttgacagtaaggaaattgaaaaagttagacaggttttagaaattatc 
gactatcgtatacaatcttatacttctaaactttga 

Sequence 164 6 

MVRRIEDHISFLEKFINDVNTLTAKLLKDLQTEYGISAEQSHVLNMLSIEALTVGQITEK 
35 QGVNKAAVSRRVKKLLNAELVKLEKPDSNTDQRLKI IKLSNKGKKYIKERKAIMSHIASD 
MTSDFDSKEIEKVRQVLEIIDYRIQSYTSKL* 

Sequence 1647 

Cont ig_0 63 5_pos_2 0 7 5__34 8 1 , 
40 is similar to (with p-value 3.0e-35) 

>gp:gpJAF051917 | AF051917_2 Staphylococcus aureus plasmid pSK 
41, complete sequence. NID: g3676412. 

atgagtagaggtgataaaatgagtcaatgcccaaattgtggtcatcaagtgaaagatgat 
acatcgcaatgtccaaactgtgggcaactattaactaagaagaaaaaaagaaagattaaa 

45 gaccaatcatctcaatcgagtaatgagaattctaccaatatacgtcttcgtaaaattgtg 
ccgataggtattagtgtatttatcttaatacttattatcgtgttatttttccttttaaga 
aat t ataa t t cgcct aa tgcacaagctaagat at tagttaatgctgtagataataat gat 
tcacaaaaagttgctacattattgagtactaaaaataaaaaagtagacgatgttgaagcg 
caacaatatattaattatgtaaaaaaagaagtaggtattaagaagtatattcgagatatc 

50 aataatactgtagataaattgaataaaagtaattcaagcgtggcatcttatatacaaacg 
aaaagtggacaagatgtacttaagataagtaaaaatggtacaaagtatttaatttttgat 
aatatgagtttcacagctccgactaaaaagccaattattaaacctaaagtagaaactaaa 
tatgaatttagaacaagtggtaaaaagaaaactgtcattgctgaagcaaataaaaataca 
cctttgggtgaatttattcctggtacatatcatttaccagctaagaaaattacagaaaac 

55 ggtacattcaatgggcatttaaattttgactttagagaaagccactctgaaaccgtagat 
gtagctgaagattatgatcaatcatttatcaatatcaaatttaagggtgcgaataaatta 
agtgataaatcagaaaaagttcaaatcaatgaccgtacattcacttattctcattctaaa 
gaatttggtccttatccaaaaacaaaagatataacgatttctgcaactggtaaggcaaaa 
ggtaagacgtttagttctgagacgaaaacaattagtgcagacgatttgaaagataatacg 
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aaagttacattggaatttgatagtgataaaataaatagctatgttgagaagaaagaaaaa 
gaagaaaatagtttgaaaaataaattaactgaattttttactggttatgcaacggctatg 
aattcagcatttaatatgaatgattttaactttatatcgagttattttaaaaagaattcg 
tctatatacacatcaatgaaaagtaatttccaaaatcgaacgaacgtgactatgatatct 
5 ccgcaagtgttaagtgttcatcgaaacggacatactgtaagaacaactattcaacatatc 
gatcatattggtaattatataaataaagattatgaattagaaatagataatgatgatagt 
aatatgcagttggttaaagaattataa 

Sequence 1648 

10 MSRGDKMSQCPNCGHQVKDDTSQCPNCGQLLTKKKKRKIKDQSSQSSNENSTNIRLRKIV 
PIGISVFILILIIVLFFLLRNYNSPNAQAKILVNAVDNNDSQKVATLLSTKNKKVDDVEA 
QQYINYVKKEVGIKKYIRDINNTVDKLNKSNSSVASYIQTKSGQDVLKISKNGTKYLIFD 
NMSFTAPTKKPIIKPKVETKYEFRTSGKKKTVIAEANKNTPLGEFIPGTYHLPAKKITEN 
GTFNGHLNFDFRESHSETVDVAEDYDQSFINIKFKGANKLSDKSEKVQINDRTFTYSHSK 

15 EFGPYPKTKDITISATGKAKGKTFSSETKTISADDLKDNTKVTLEFDSDKINSYVEKKEK 
EENSLKNKLTEFFTGYATAMNSAFNMNDFNFISSYFKKNSSIYTSMKSNFQNRTNVTMIS 
PQVLSVHRNGHTVRTTIQHIDHIGNYINKDYELEIDNDDSNMQLVKEL* 

Sequence 164 9 
20 Cont ig_0 635_pos_5 4 7 9_6 1 2 6 , 

putative peptide of unknown function 

atgaaaaaaatgatattaatcaatgtgattactatcattgtcctagttgttattggtgtg 
ttaggcttttggttctggcataacacaacaagttatgtgacaactgacaatgcaaaagtt 
gatggagatcaaataaaaatctcaagtcctgcatctggacaaattaaatctcttaatgtt 

25 aagcaaggagacaaacttgataaaggtgataaagtagcagaagttttagcacaaggccaa 
gatgggcaatcaaaagatatgaacatcaaaatgccacaaaaaggtactattgttaaaaca 
gatggtatcgaaggttctatgactcaagcagggaacccaat tgcatatgcatataat tta 
gatgatctatatataactgctaatgtagatgaaaaagatatttctgacgtggaaaaaggc 
aacgacgttgatgtagatatcgacggtcaaaaagcatcaatcaaaggtaaggttgaagaa 

30 gtaggccaagcgactgcagctagcttttcattgatgccatcatcaaatagcgacggtaac 
tatacgaaagtttctcaggtagtacccgtaaaaatctctttagattctaatccatctaaa 
aatgttgtcccaggtatgaacgctgaagttaaaattcataaaaattaa 

Sequence 1650 

35 MKKMILINVITIIVLVVIGVLGFWFWHNTTSYVTTDNAKVDGDQIKISSPASGQIKSLNV 
KQGDKLDKGDKVAEVLAQGQDGQSKDMNIKMPQKGTIVKTDGIEGSMTQAGNPIAYAYNL 
DDLYITANVDEKDISDVEKGNDVDVDIDGQKASIKGKVEEVGQATAASFSLMPSSNSDGN 
YTKVSQVVPVKISLDSNPSKNVVPGMNAEVKIHKN * 

40 Sequence 1651 

Contig_0635_pos_6139_*7 053, 

is similar to (with p-value 9.0e-47) 

>sp:sp| P54585I YHCA_BACSU HYPOTHETICAL 58.3 KD PROTEIN IN GLP 
D-CSPB INTERGENIC REGION. >gp : gp | Z99108 | BSUB0005_169 Bacillu 

45 s subtilis complete genome (section 5 of 21): from 802821 to 
1011250. NID: g2633055. 
atgacagcgaccttcattattatatatattgtagtagcgctcatactcattggttttatt 
aatttctttttaattaagcgtaaaagaaaaaataaagacaaaagagtggaacaacgttcg 
acaatagattctaagagagaaagcaatcaatctaaatttaaagcaagcgatttagaacaa 

50 acaactaagtcaaatactgacccaacgcaatcaaacgatattgaagatgaaaaacgaaaa 
aatcactttgactcagaaatagataatgcatctcaatctatcaatacagatagtaaagag 
gatagaaacgcgttaagtcataagaaccaagaggaagatgacgcatcgaacgatgtgttg 
aaccctatcgatccaaattctactgaaggtagagttaatgaaagaattaaaaatcaagag 
tctaactttatttttggtaaaggcataactagaggtaaaattttagcggcaatgttattt 

55 ggtatgtttatcgcgattctaaaccaaactctattaaatgtggcattgcctaaaataaat 
acagagtttaatatttctgcctcaactggtcaatggttaatgactggttttatgttagtg 
aatggtatattaatacctattagtgcctttttatttaataaatattcttatagaaaatta 
tttattataggtttagcactatttacattaggttccttagtttgtgcaatctcatttaat 
ttcccaattatgatgagtggacgtgtattacaagccataggcgcaggtatattgatgccg 
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ttaggttctaacgttattgttaccattttcccacctgaaaaacgcggtgtggcaatgggg 
acaatgggtattgcaatgatattagcacctgcaatcggtccaacactttcaggttacata 
gtgttgcgttactag 

5 Sequence 1652 

MTATFIIIYIVVALILIGFINFFLIKRKRKNKDKRVEQRSTI DSKRESNQSKFKASDLEQ 
TTKSNTDPTQSNDIEDEKRKNHFDSEIDNASQSINTDSKEDRNALSHKNQEEDDASNDVL 
NPIDPNSTEGRVNERIKNQESNFI FGKGITRGKILAAMLFGMFIAILNQTLLNVALPKIN 
TEFNISASTGQWLMTGFMLVNGILIPISAFLFNKYSYRKLFI IGLALFTLGSLVCAISFN 
10 FPIMMSGRVLQAIGAGILMPLGSNVIVTIFPPEKRGVAMGTMGIAMILAPAIGPTLSGYI 
VLRY* 

Sequence 1653 

Cont ig_0 6 3 5_pos_7 08 6_7 757, 

15 is similar to (with p-value 3.0e-49) 

>gp: gp| AF044668 | AF044668_5 Salmonella typhimurium (g30k) gen 
e, partial cds; and SOS ribosomal protein L32 (rpmF) , PlsX ( 
plsX), 3-oxoacyl-acyl carrier protein synthase III (fabH), m 
alonyl CoA-acyl carrier protein transacylase (fabD), and 3-o 

20 xoacyl-acyl carrier protein reductase (fabG) genes, complete 
cds. NID: g3282798. 
atgggacatagcttaggagaatattcaagcttagttgctagtggtgtattatcttttgaa 
gatgcggttagaat tgtgcgtaaacgtggccaacttatggctcgagcgtttcctaacggt 
gttggaggtatggcagcagtattaggtttggattatgatgatgttgataagatatgtcaa 

25 acgttatctacaaaagaacagttaattgaacctgctaatattaactcaccaggtcaaatc 
gtggtgtctggacataaatctttaattgatgaattagtagaaaagggcaaagaacttggt 
gctaaacgcgttcttccattagctgtttccggtccttttcattcttcaatgatgaoagtt 
attgaagaggattttgctaatttcattaatcaatttgaatggcataatgctaattatcca 
gttgttcagaatgttaatgcaaagggagaaaccgatgctgaagtaattaaacgcaatatg 

30 gttaaacaattatattcacctgttcaatttattcaatcaacggagtggttgattaatcaa 
ggtgtcgatcactttat tgaaattggaccgggaaaagtattatctggact tatcaaaaaa 
ataaatcgagatgtaaaaatcacttcaattcaaacactcgaagatgtgaaaggatggaat 
aatcatgaataa 

35 Sequence 1654 

MGHSLGEYSSLVASGVLSFEDAVRIVRKRGQLMARAFPNGVGGMAAVLGLDYDDVDKICQ 
TLSTKEQLIEPANINSPGQIVVSGHKSLIDELVEKGKELGAKRVLPLAVSGPFHSSMMKV 
IEEDFANFINQFEWHNANYPVVQNVNAKGETDAEVIKRNMVKQLYSPVQFIQSTEWLINQ 
GVDHFIEIGPGKVLSGLIKKINRDVKITSIQTLEDVKGWNNHE* 

40 

Sequence 1655 

Contig_0635_pos_7969_8484, 

is similar to (with p-value 2.0e-65) 

>sp:sp|P51831|FABG_BACSU 3-OXOACYL- [ACYL-CARRI ER PROTEIN] RE 
45 DUCT AS E (EC 1.1.1.100) (3-KETOACYL- ACYL CARRIER PROTEIN RED 
UCTASE) . >gp:gp|U59433|BSU59433_3 Bacillus subtilis PlsX (pi 
sX) , malonyl-CoA: Acyl carrier protein transacylase (fabD) an 
d 3-ketoacyl-acyl carrier protein reductase (fabG) genes, co 
mplete cds, and acyl carrier protein (acpP) gene, partial cd 
50 s. NID: gl502418. 

gtggtaagtcagtttggttctgtagatgtattggttaacaatgcagggataactaaagac 
aact tact t at gcgt a t gaaagaacaagaa tggga t gacgtgatt gat a cgaactt a aa a 
ggtgtgtttaactgtattcaaaaagtaacgccacaaatgttgcgtcaacgtagtggtgca 
atcattaatctaactagtattgttggtgcaatgggtaatcctggacaagcaaactatgtt 
55 gcaacaaaagcaggtgtcattggattaacaaaaactgcagcacgagaactagcatcacga 
ggtattacagtgaacactgtagcacctggtttcatcgtttcagacatgacaaatgcttta 
agtgatgatttgaaggatcaaatgttagagcaaattcctttaaaacgttttggagaagat 
acagatatagctaatactgttgccttcctagcttctgataaagctaaatatattacaggc 
caaacaattcatgttaacggtggaatgtatatgtaa 
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Sequence 1656 

VVSQFGSVDVLVNNAGITKDNLLMRMKEQEWDDVIDTNLKGVFNCIQECVTPQMLRQRSGA 
I INLTS I VGAMGN PGQAN YVATKAGV I GLTKTAARELASRG I T VNTVAPG FI VS DMTNAL 
5 SDDLKDQMLEQIPLKRFGEDTDIANTVAFLASDKAKYITGQTIHVNGGMYM* 

Sequence 1657 

Contig_0635_pos_9060_97 97, 
is similar to (with p-value 2.0e-63) 
10 >sp:sp|P51833|RNC_BACSU RIBONUCLEASE III (EC 3.1.26.3) (RNAS 
E III). >gp : gp I D64 1 1 6 | D64 1 1 6_3 Bacillus subtilis genes for O 
RF1 , ORF2, ORF3, ORF4 and Srb, partial and complete cds . NID 
: gl389548. 

gtggctaaccagaaaaagaaagaaatggtacatgattttcaacaaaaatttactgataaa 
15 atgaagtcgttaggattacgttttaaaaatattgatttatatcaacaggcattctctcat 
tcaagttttattaatgactttaatatgaatcgtttagaacacaacgaacgcttagaattt 
ttaggtgatgcggtattagaattgacggtttcacgctatctttttgacagacatcctcat 
ttaccagaaggtaatttgacaaagatgcgcgcaacaattgtttgtgaaccttcacttgtg 
atatttgcgaataagattaaattaaacgaactgattttattaggtaaaggtgaagagaag 
20 acaggaggcagaacaagaccttcccttatttcagatgcatttgaagcttttgtaggtgca 
ctgtatttagatcaaggtttagattcagtatggacatttgctgaaaaagtcatctttccg 
tatgtagaagatgacgagcttgttggtgtcgtagactttaaaacacagttccaagaatat 
gtgcatagccaaaataaaggagatgtgacataccaattaattaaagaagagggtcccgca 
catcatagactatt tacatcggaagtga ttttagaaaataaagcagttgcagagggtaaa 
25 ggaaagacaaagaaagaatccgaacaaaaggcagcagaacaagcgtataaactaatgaaa 
aataaaaaatcattataa 

Sequence 1658 

VANQKKKEMVHDFQQKFTDKMKSLGLRFKNIDLYQQAFSHSSFINDFNMNRLEHNERLEF 
30 LGDAVLELTVSRYLFDRHPHLPEGNLTKMRATIVCEPSLVIFANKIKLNELILLGKGEEK 
TGGRTRPSLISDAFEAFVGALYLDQGLDSVISJTFAEKVIFPYVEDDELVGVVDFKTQFQEY 
VHSQNKGDVTYQLIKEEGPAHHRLFTSEVILENKAVAEGKGKTKKESEQKAAEQAYKLMK 
NKKSL+ 

35 Sequence 1659 

Contig_0635_pos_10530_0, 

is similar to (with p-value 0.0e+00) 

>sp:sp|P51834 I YLQAJ3ACSU HYPOTHETICAL 135.4 KD PROTEIN IN RN 
C-SRB INTERGENIC REGION (ORF4). >gp : gp | D64 1 1 6 | D64 1 1 6_4 Bacil 
40 lus subtilis genes for ORF1, ORF2, ORF3, ORF4 and Srb, parti 
al and cbTmplete cds. NID: gl389548. 

gtggaaccgttaaaagaagaggcggccattgctaaagaatataagcaattatctaaagag 
atggagcaaagtgatgtcatcgttacagtatctgacattgatcattatactgaagataat 
cagcgattagatgagcgtttaaatcacctaaagagtcaacaggctgagaaagaaggtcaa 

45 caagctcaaatcaatcagttactacaaaaatataaaggtaaacgtcaacaaaacgattat 
gacattgaaaagttaaattatgaattagttaaagcaactgagaattatgagcaattatca 
ggtaagctaaatgtattagaagaacgaaagaaaaaccaatcagaaacaaatgcaagatat 
gaggaagaattagataatttagaatcacaaattgattctattaaaaatgaaaaagcccaa 
aatgaaaaattattagctgagttaaaaaataagcaaaagcaattaaacaaggaagttcaa 

50 gaattagagtcacttct ttatatatccgatgaacaacacgacgaaaaactagaagaaatt 
aaaaatagttattatacattgatgtcagaacaatcagatgttaataatgatataagattt 
ttagaacatacaatcaatgaaaatgaagcaaaaaaatcacgattagattcgcgtttagta 
gaggctttcaatcaacttaaagacattcaacaaaatattactcaaacacaaaaggaatac 
caaagttctaagaaatctatggaaaaagtagaacaaaatattcaacaattagaacaacag 

55 ttgacagattctaaaagacttctatctgaatatgaaaataaactatatcaagcctatcgt 
tataatgaaaagt taaaatcaagaattgatagtttagctactcaagaggaagattacacg 
tatttctttaatggtgtaaagcacattttgaaagcaaaagataaagaattaagaggaatt 
catggtgcggttgcagaagtgattaacgttccttcagaaatgacacaagcgattgaaacc 
gccttaggtgcatcgttacagcacgttattgttgataatgaaaaagacggtcgccaagca 
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atccagtacttgaagcaaagaggtttaggtcgtgctacttttttaccattaaacgtgatt 
caaccaagacatgtagctgctgacattaaagatgtagctcgtggttcacaagggttcatt 
aatattgcatctgatgccataaatgtatctgctaaatatcaaaacatcattgaaaattta 
ttaggtaataccatcattgtagaaaatttaaaacatgcaaatgaacttgcacgtgccatt 
5 cgatatcggacaagaatagtaactttagaaggtgatgttgtaaatcctggtggttccatg 
acaggaggaggagcacgtaaaacaaaaagtatattgtcacaaaaagatgaattatcaaca 
atgcgaaatcaacttgaagattatcaacgacaaacagcagaatttgaacgtcagtttaaa 
gaacaaaaaacacaagctgaacaattaagtgaacaatattttagtgcaagtcagcagtac 
aacaatttaaaagaacaagtacatcatcacgaattagaactggatagactaaaaacacaa 

10 gaagcacatcttaaaaatgaacatgaagagtttgaatttgaaaaaaatgatggatat caa 
agtgataaaagtaaagaaacattaaaagaaaaacaaaatcatttaattgagatacaacaa 
caattgaagcaactagaaagtgatattgaaagatatacacaattatcaaaagaaggaaaa 
gcttcgacacatcaaacacaacaacaactacatcaaaaacaatctgatttagctgttgtt 
aaagagcgaattaaatcgcaaaagcaagtttatgaacgtttagataaacaacttagcgat 

1 5 tcagaacgtcaaaaaattgaagtaaatgaaaaaatcaaattgtttaattcagatgaaatg 
atgggtaaagatgcttttgaaaagttgaaagagcaaattcagcaacaagaaaatgtaaga 
caaaatttaaatcaacaacttagtgagattaaacagcaacgtaaagatcttaatgagaaa 
atcgaaataaatgaaagtcagcttcaaaaatgtcatcaagatatactttctatagaaaat 
cattatcaagatattaaagcaaaacaatcaaagctagatgtattaatcaaccatgcaata 

20 gatcatttaaatgacacgtatcaactcacagtagaacgtgcaagaatggaatatgattct 
gatgaaactattgacaatttgcgtaaaaaagtaaaattaacgaagatgacaatcgatgaa 
ttaggtcctgtaaatttaaatgcaatagaacaatttgaagagttgaatgaacgatacaca 
tttttaaatgagcaacgaacagatttaagagaagcaaaagaaaccttagaacaaatcatt 
catgaaatggataaagaagttgaaggtcgttttaagacaacatttcatgcggttcaagat 

25 cattttacgacagtgtttaagcaattatttggtggtggacaagcagaacttcgtttaact 
gaagatgactatttgtctgctggcgttgatatcatcgtacaaccgccaggaaaaaaatta 
caacatctttcatta 

Sequence 1660 

30 VEPLKEEAAIAKEYKQLSKEMEQSDVIVTVSDI DHYTEDNQRLDERLNHLKSQQAEKEGQ 
QAQINQLLQKYKGKRQQNDYDIEKLNYELVKATENYEQLSGKLNVLEERKKNQSETNARY 
EEELDNLESQIDSIKNEKAQNEKLLAELKNKQKQLNKEVQELESLLYISDEQHDEKLEEI 
KNSYYTLMSEQSDVNNDIRFLEHTINENEAKKSRLDSRLVEAFNQLKDIQQNITQTQKEY 
QSSKKSMEKVEQKIQQLEQQLTDSKRLLSEYENKLYQAYRVNEKLKSRIDSIATQEEDYT 

35 YFENGVKHILKAKDKELRGIHGAVAEVINVPSEMTQAIETALGASLQHVIVDNEKDGRQA 
IQYLKQRGLGF^ATFLPLNVIQPRHVAADIKDVARGSQGFINIASDAINVSAKYQNIIENL 
LGNTI IVENLKHANELARAIRYRTRIVTLEGDVVNPGGSMTGGGARKTKSILSQKDELST 
MRNQLEDYQRQTAEFERQFKEQKTQAEQLSEQYFSASQQYNNLKEQVHHHELELDRLKTQ 
EAHLKNEHEEFEFEKNDGYQSDKSKETLKEKQNHLIEIQQQLKQLESDIERYTQLSKEGK 

40 ASTHQTQQQLHQKQSDLAVVKERIKSQKQVYERLDKQLSDSERQKIEVNEKIKLFNSDEM 
MGKDAFEKLKEQIQQQENVRQNLNQQLSEIKQQRKDLNEKIEINESQLQKCHQDILSIEN 
HYQDIKAKQSKLDVLINHAIDHLNDTYQLTVERARMEYDSDETIDNLRKKVKLTKMTIDE 
LGPVNLNAIEQFEELNERYTFLNEQRTDLREAKETLEQI IHEMDKEVEGRFKTTFHAVQD 
HFTTVFKQLFGGGQAELRLTEDDYLSAGVDIIVQPPGKKLQHLSL 

45 

Sequence 1661 

Contig_0635_pos_5254_4'7 87, 

putative peptide of unknown function 

atgatatgtgcatatagtggcgtgaatcgctctactttttatgatcattttcaagataaa 
50 tatcaattactagataagatccaaaattatcatttaaacaaatatatatctttactacaa 
tctttctataacgattttcatcatatcaaaacagatcaaaaaaaattatataaatttttc 
ttattgatagccaaatatattaaacgtaaagaagcgttctacagagcaacacttgtaaca 
tatcctaataaagatattgcattagattacattaacgccactaaaacatgttatgaaaaa 
gtcatgaatagatatgaaacctcaataaataataaacgtatgtttatcatttattcagtc 
55 ggtggtcaagcaggtgtatttatcgattggttacgtaatggatgcatcgaatctcctcaa 
gaggtcgctcaagttcttttagctaatacaattaaattacaacgataa 

Sequence 1662 

MICAYSGVNRSTFYDHFQDKYQLLDKIQNYHLNKYISLLQSFYNDFHHIKTDQKKLYKFF 
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LLIAKYIKRKEAFYRATLVTYPNKDIALDYINATKTCYEKVMNRYETSINNKRMFIIYSV 
GGQAGVFIDWLRNGCIESPQEVAQVLLANTIKLQR* 

Sequence 1663 
5 Contig_0636_pos_281_14 05, 

is similar to (with p-value 2.0e-92) 

>sp:sp|P54 54 2|YQJE_BACSU HYPOTHETICAL 39.7 KD PROTEIN IN GLN 
Q-ANSR INTERGENIC REGION. >gp:gp| D84432 | BACJH642_260 Bacillu 
s subtilis DNA, 283 Kb region containing skin element. NID: 
10 g2627063. >gp: gp I Z99116 I BSUB0013_102 Bacillus subtilis compl 
ete genome (section 13 of 21): from 2395261 to 2613730. NID: 
g2634723. 

atgattaatcaaaaacgtttattagattgttttctagaattagtgcaaattgattcggaa 
acagggcacgaagaaacaattcaaccttatcttaaagatacgtttgaaaaaatggggctc 

15 catgttattgaagatgaagcttcaaaaaataatagattaggtgctaacaatcttatttgt 
acgttaaaaagtaatataagtcatcagaatgtgccgaaaatttattttacaagccacatg 
gatactgtcgttccaggaaaaaacatccaacctgtagtaaaagaagatggatacgtttat 
agcgatggaactacgatactcggggcggacgataaagccggtcttgcggcaataattgaa 
gcgattaaacaaattaaggaatcaaatttgccacacggacagattcaaataattattacc 

20 gtgggagaggaatctggattagtaggtgctaaagcaatagatactcgccttcttgatgca 
gatttcggctatgctgtagatgcaagtaaagatgttggaactactgttatcggtgctcca 
actcaagtaaagatttatacaactataaaagggaaaaccgcccatgcaagcacacctaaa 
aaaggtattagcgcaataaatattgcatcaaaagcaatcagtcgaatgaaattgggacaa 
gtcgatgcattaacaacagccaatataggtaaatttcacggaggttgtgccactaatatt 

25 atagctgatgaagtcactttagaggcagaagcacggtcacatgatgatcaaagcattaat 
aaacaagtgaaacatatgaaagagactttcgaaacgacagcaaatgaattaggcggtcaa 
gctgaagtgttagttgaaaaaagttatccgggatttgaagttagtgaagctgacaaagta 
acacaatatgptatatctagtgcattagccctcggtctaaaaggtgatacttgtattgct 
ggtggtggttcagacggcaacatcatgaatcaatatggcattccttctgtgattttagga 

30 gtaggatatgaaaacatacatactacttcggaaagaatagcaataaaggatatgtatatg 
ct cacaaga caa a taataaaaatt at tgagctagt age tgaataa 

Sequence 1664 

MINQKRLLDCFLELVQI DSETGHEETIQPYLKDTFEKMGLHVIEDEASKNNRLGANNLIC 
35 TLKSNISHQNVPKIYFTSHMDTWPGKNIQPVVKEDGYVYSDGTTILGADDKAGLAAIIE 
AIKQIKESNLPHGQIQI IITVGEESGLVGAKAIDTRLLDADFGYAVDASKDVGTTVIGAP 
TQVKIYTTIKGKTAHASTPKKGISAINIASKAISRMKLGQVDALTTANIGKFHGGCATNI 
IADEVTLEAEARSHDDQSINKQVKHMKETFETTANELGGQAEVLVEKSYPGFEVSEADKV 
TQYAISSALALGLKGDTCI AGGGSDGNIMNQYGIPSVILGVGYENIHTTSERIAIKDMYM 
40 LTRQIIKIIELVAE* 

Sequence 1665 

Con t i g_0 63 6_pos_l 4 8 5_2 891, 

is similar to (with p-value 0.0e+00) 

45 >sp:sp|P14062|6PGD_SALTY 6-PHOSPHOGLUCONATE DEHYDROGENASE, D 
ECARBOXYLATING (EC 1.1.1.44). >pir : pir | S04397 | S04 397 phospho 
gluconate dehydrogenase (decarboxylating) (EC 1.1.1.44) - Sa 
Imonella typhimurium >gp: gp I X15651 I SEGNDB_1 S. enterica gnd 
gene for 6-phosphogluconate dehydrogenase. NID: g47699. >gp: 

50 gp | M64 332 1 STYGNDA_1 S . typhimurium (strain LT2) 6-phosphogluc 
onate dehydrogenase (gnd) gene, complete cds . NID: gl54099. 
atgacacaacaaattggagtagtgggtttagcagtaatggggaaaaacctagcttggaat 
attgaatcacgtggttatagtgtttctgtttataaccgatcaagacaaaaaactgatgaa 
atggttaaagaatcgcctggaagagaaatttacccaacatactcattagaagaatttgta 

55 gaatctttagagaaacctcgtaagattttattaatggtaaaagctggacctgcaacagat 
gccactatagatggtttattacctttattagacgatgatgatattttaattgatggtggt 
aatactaattaccaagatacgattcgtcgaaataaagctttagctgaaagtagtattaac 
tttattggtatgggagt ttctggtggagaaatcggcgcactcacgggcccttctttaatg 
ccaggtggtcaaaaagatgcttataacaaagtcagcgatatcttggacgcaattgctgct 
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aaggcacaagatggtgcttcatgtgtaacttacattggccctaatggtgcaggacattat 
gttaagatggtacacaatggtatcgaatatgcagatatgcaattaattgctgaaagttat 
gcaatgatgaaagatttattaggcatgtcacataaagaaatttctcaaacttttaaagaa 
tggaatgctggagaacttgaaagttatttaatagaaattacaggtgatattttcaataaa 
5 ttagatgatgacaatgaagcacttgtagaaaaaatattagatactgcaggtcaaaaaggc 
acaggtaaatggacttcaattaacgcactagaattaggtgttcctttaacaatcattaca 
gaatctgtatttgcgagattcatctcatcaattaaagaagaacgtgttactgcttctaaa 
tctttaaaaggacctaaagcacattttgaaggcgataaaaaaacattcttagaaaaaata 
cgtaaggcactttatatgagtaaaatatgctcatatgcacaaggtttcgctcaaatgaga 

10 aaagccagtgaagataatgagtggaatttgaaattaggcgaattagcaatgatttggcgt 
gaaggttgtattattcgtgcacaattcctacaaaaaattaaagatgcctacgataataat 
gaaaacttacaaaacttattattagacccttacttcaaaaacattgttatggaatatcaa 
gatgcactacgtgaagtagtagctactagcgtgtacaatggcgtgccaacacctggtttt 
tcagcaagtataaattattatgatagttatcgctcagaggatttacctgcaaacttaatt 

15 caagcacaacgtgattactttggcgcacatacttatgaacgtaaagaccgtgaaggtatt 
ttccatacacaatgggtagaagaataa 

Sequence 1666 

MTQQIGWGLAVMGKNLAWNIESRGYSVSVYNRSRQKTDEMVKESPGREIYPTYSLEEFV 
20 ESLEKPRKILLMVKAG PATDATIDGLLPLLDDDDILIDGGNTNYQDTIRRNKALAESSIN 
FIGMGVSGGEIGALTGPSLMPGGQKDAYNKVSDILDAIAAKAQDGASCVTYIGPNGAGHY 
VKMVHNGIEYADMQLIAESYAMMKDLLGMSHKEISQTFKEWNAGELESYLIEITGDIFNK 
LDDDNEALVEKILDTAGQKGTGKWTSINALELGVPLTIITESVFARFISSIKEERVTASK 
SLKGPKAHFEGDKKTFLEKIRKALYMSKICSYAQGFAQMRKASEDNEWNLKLGELAMIWR 
25 EGCIIRAQFLQKIKDAYDNNENLQNLLLDPYFKNIVMEYQDALREVVATSVYNGVPTPGF 
SASINYYDSYRSEDLPANLIQAQRDYFGAHTYERKDREGIFHTQWVEE* 

Sequence 1667 

Contig_0636_pos_3061_4 716, 
30 is similar to (with p-value 0.0e+00) 

>pir:pir IS44188 I S44188 alpha-glucosidase (EC 3.2.1.20) - Sta 
phylococcus xylosus >gp: gp | X78853 I SXMALRAG_2 S.xylosus malR 
gene and malA gene. NID: g474175. 

atgaaaagaaattggtggaaagaagcagttgcatatcaagtatatccacgaagttttaat 

35 gatagtaatggagatggaataggtgatctacctggattaattgaaaaattagattatcta 
gaaaatttaggaatagatgtcatttggttaagcccaatgtatccatcaccaaacgatgat 
aatggatatgatattagtgactacaaaggcattatgagtgaatttggtacaatgaacgat 
tttgatcaattgttatcaagcatacatcaaagagggatgaaattaatattagacttagtg 
gttaatcacacatcagatgaacacccttggtttattgaatcaaaatctagtaaaacaaat 

40 gcaaaaagagattggtatatttgggcagatcctaaaccggatggatctgaacctaataac 
tgggaaagtatctttaatggttcaacttgggagtttgacgaatcgactaagcaatactat 
ttccatttatttagcaaaaagcagccagatttaaattgggaaaatccagatgtaagacaa 
gctgtgtttgaaatgatgaattggtggtttgaaaaaggtattgacggatttagagttgat 
gccattactcatattaaaaagaattttgaagcaggagatttacctgtacctgatggcaaa 

45 aaatttgctccagcatttgatgtagatatgaatcagccaggaatacaagaatggctccaa 
gaaatgaaagataaatcgttaagtcggtatgacattatgactgtaggcgaggctaatggt 
gttactcctaatgatgctgaagaatgggtaggagaagaaaatgggaaatttaatatgata 
ttccagtttgaacatcttggtttatggagtactggcgatacgaaattcgatgttaaatcc 
tataaacaagtcttaaatcgttggcaaaagcaactagaaaatgtaggttggaatgcttta 

50 tttatcgaaaaccatgatcaaccacgtcgtgtttcaacctggggtgatgataaaaattat 
tggtatgaatcagcaactagtcacgctactgcctactttttacaacagggcacacctttt 
atttaccaaggtcaagaaataggtatgactaattatccatttgaaagcattgaaagtttc 
aacgatgtcgcagtgaaaactgaatatcaaatagtcaaaaaagaaggtggagatgtcaat 
caattactagataaatataaaatggaaaaccgagacaatgcaaggactccaatgcaatgg 

55 aataattctatcaatgctggattcactactggtaagccatggtttcatgtaaaccctaac 
tatacagaaattaatgttaaacaacaactaaatgataagttttcgatactttcttattat 
aaagcgttaattcaactaaaaaaatctgatttgatttacacctacggtaagtttaatatg 
gtcgatgctgaaaataagcaggtttttgcatatacacgcacatttaaaaacaatactgta 
ttaattgtagccaatctcacaaatgaagtatcagaactaaacctaccttttgaattagat 



420 



WO 01/34809 



PCT/US00/30782 



atttcatctgtagatataaaattgcataattatcacttaaatgatataaatttagaccat 
attaaaccttatgaatcattcgtcgttgaaatataa 

Sequence 1668 

5 MKRNWWKEAVAYQVYPRSFNDSNGDGIGDLPGLIEKLDYLENLGIDVIWLSPMYPSPNDD 
NGYDISDYKGIMSEFGTMNDFDQLLSSIHQRGMKLILDLWNHTSDEHPWFIESKSSKTN 
AKRDWYIWADPKPDGSEPNNWESIFNGSTWEFDESTKQYYFHLFSKKQPDLNWENPDVRQ 
AVFEMMNWWFEKGIDGFRVDAITHIKKNFEAGDLPVPDGKKFAPAFDVDMNQPGIQEWLQ 
EMKDKSLSRYDIMTVGEANGVTPNDAEEWVGEENGKFNMIFQFEHLGLWSTGDTKFDVKS 
10 YKQVLNRWQKQLEN VGWN ALFI ENH DQPRRVSTWG DDKNYWYESAT S H ATAYFLQQGT P F 
IYQGQEIGMTNYPFESIESFNDVAVKTEYQIVKKEGGDVNQLLDKYKMENRDNARTPMQW 
NNSINAGFTTGKPWFHVNPNYTEINVKQQLNDKFSILSYYKALIQLKKSDLIYTYGKFNM 
VDAENKQVFAYTRTFKNNTVLIVANLTNEVSELNLPFELDISSVDI KLHNYHLNDINLDH 
IKPYESFVVEI* 

15 

Sequence 1669 

Contig_0636_pos_6783_7 4 60, 

putative peptide of unknown function 

gtgggcattttagtatcggggtcagggatagcgagtgtacaaacaaatataactcacgca 
20 aaagaaagtcacgattcaactcctcaaaatattaaattagtgggaacgtatgatacttct 
caagttgattccaaaacgatgaaacaatttaaagaaatagaaaaagaagataataatttc 
cacataactaaacatggaaataaagtcgttgtagaagacaaattacctaatccagagaat 
aaaacttcaagttattcagctgatggtagtgctgaaaataatacaaaagtaattaatttc 
tctgattttgttggtaatatggatgggaaagatgatggaaaaatatcggatgggataacc 
25 ttttatagtggtaaatcatataacggacaacacgatggtcaaaaagtaaaaaaagggact 
catgtacattgtaatagatttaacggaacaaaatctgatcatagatactggtcaaaaaaa 
catcctagagcttatgtagatttttataaaagtgattgctggtatcacgccaaagcttat 
aaatgttcttccttgggaaaaatgactaaatgcgatggtttgaatagtatttatagaaaa 
ggtgtcaaagattgctcatcatggaaaggtaaacccaaacataaaaactggcctaaaaca 
30 gcatggtatagaaattaa 

Sequence 1670 

VGILVSGSGIASVQTNITHAKESHDSTPQNIKLVGTYDTSQVDSKTMKQFKEIEKEDNNF 
HITKHGNKVVVEDKLPNPENKTSSYSADGSAENNTKVINFSDFVGNMDGKDDGKISDGIT 
35 FYSGKSYNGQHDGQKVKKGTHVHCNRFNGTKSDHRYWSKKHPRAYVDFYKSDCWYHAKAY 
KCSSLGKMTKCDGLNSI YRKGVKDCSSWKGKPKHKNWPKTAWYRN* 

Sequence 1671 

Cont ig_0 63 6_pos_7 4 6 4_0 , 

40 putative peptide of unknown function 

atgaataaaatcttaaaaatattaataacttctattattgttatcattattaccttaaca 
gtttggacttttagtgtgattacttatcagaaacacaagagtgagaaaatcatcaatcac 
gttatagaacgtaagggttgggataaaaaaataaaaaatgaaaaaatgagttttaatatt 
ataatgggatatgctgaaaaagatattgtttttaaagatcaaccatatagtgagtatgag 

45 tataacgtgacaccagcaccatggacagatgataaagaatataaggtgtggggggaaaca 
ga 

Sequence 1672 

MNKILKILITSIIVII ITLTVWTFSVITYQKHKSEKIINHVIERKGWDKKIKNEKMSFNI 
50 IMGYAEKDIVFKDQPYSEYEYNVT PAPWTDDKEYKVWGETX 

Sequence 1673 

Con t i g_0 63 6_pos_5 8 7 2_4 98 2 , 
is similar to (with p-value 5.0e-20) 
55 >gp:gp|X92946|LLLPK214_18 Lactococcus lactis sp. lactis plas 
mid pK214, complete sequence. NID: g2467210. 

gtgaaagttttggacgttatcaagcaaatacaacaggcaattgtatatattgaggatcgt 
ttgttagagccttttaatttgcaagaattaagtgattacgttggtctttctccgtatcat 
ttggatcaatcttttaagatgatagttggtcagtccccagaggaatatgcgcgtgcacgt 
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aaaatgacaatagcagcaaacgatgtagttaatggagctagtcgattaatggatgttgct 
aagaaatatcgttatgcgaattctaatgatttcgcaaatgattttagcgattttcatggt 
atctcacctattcaagctacaacaaaaaaagatgaactaaaaatacagcaacgactgtat 
ataaaattatcaacgactgaaaatgcaccctatacatacagacttcaagagactgatgat 

5 ' atttctttagttggctattcaagatttattcctactgagcaattatcaaatccatttaat 
attccagactttttagaggatctattagtagatggttatattaaagaacttaaacgttat 
aatgatacgagcccgtatgaattatttgtagtcagctgtcctctggaacaaggtttagaa 
atatttgttggtgttccgagtgaacgttacccttcacatcttgaaagcagatttttacct 
ggtcgccattatgcattgtttaatttacaaggtgaaattgattatgctacaaacgaggct 

10 tggtattatattgagtctagcttgcaact tact ttaccttatgagcgaaatagttta tat 
gttgagatttatccacttgatatttcatttaatgacccattcactaagattcaattatgg 
ttgcctattaaacaagaaatctatgatttagatgaaggttatcaaaattaa 

Sequence 1674 

15 VKVLDVIKQIQQAIVYIEDRLLEPFNLQELSDYVGLSPYHLDQSFKMIVGQSPEEYARAR 
KMTIAANDVVNGASRLMDVAKKYRYANSNDFANDFSDFHGISPIQATTKKDELKIQQRLY 
IKLSTTENAPYTYRLQETDDISLVGYSRFIPTEQLSNPFNIPDFLEDLLVDGYIKELKRY 
NDTSPYELFVVSCPLEQGLEIFVGVPSERYPSHLESRFLPGRHYALFNLQGEIDYATNEA 
WYYIESSLQLTLPYERNSLYVEI YPLDISFNDPFTKIQLWLPIKQEI YDLDEGYQN* 

20 

Sequence 167 5 

Contig_0637_pos_2774_4171, 

is similar to (with p-value l.Oe-83) 

>gp: gp| U96107 | SCU96107_5 Staphylococcus carnosus N5,N10-meth 
25 ylenetetrahydromethanopterin reductase homolog, SceB precurs 
or (sceB) and putative transmembrane protein genes, complete 
cds, and putative Na+/H+ antiporter NhaC (nhaC) gene, parti 
al cds. NID: g2735503. 

atgactcaaaagtatagatatcctacttttttagaatctatttctactattttagttatg 

30 gttgtcgttgtagtaattggttttgttttctttaatgtcccgatacaaatattattatta 
atttcttcagcttatgcagcattgattgcacatagagtgggattaaaatggaaggattta 
gaagaggggattactcatcgattgagcacggcgatgccagctatctttattattttagct 
gttggaatcattgtaggaagttggatgtattctggaacagttccagcgttaatttactat 
ggacttaaatttttaaacccaagttatttattagtatctgcatttataatcagtgcaatg 

35 acttcaatcgctacaggaacggcttggggatcggcatctacagcaggcattgcactcata 
tcaattgctaatcaattaggtgtgccagcaggtatggctgctggtgccattattgcaggg 
gcggtttttggtgataaaatgtctccattatctgatactacaaatttggcagctcttgta 
actaaagttaatatttttgctcacattaaatcgatgatgtggacaacaatccctgcttca 
ataataggattggctatatggtttattgttggattacaatataagggagacgcaaataca 

40 caacaaattcaaaatctattaaaagaattaacaacaatttataacttgaatttttgggta 
tggatcccacttattatcatagttttatgtttaatatttagaatctctacagtaccgtca 
atgcttatctctagtatcagtgctttagttattggaacattcgatcatcaatttaatatg 
aaagatggttttaaagcttcttttgatggatttaatcatacaatgctacaccagtctcat 
atttcagataatgctaagacgttgattgagcagggtggtatgatgagtatgactcaaatc 

45 attgtaactatattttgtggttatgcttttgctggtattgttgaaaaggcaggttgttta 
gacgtaattttagagacaatagctaaaggcgtaaagtcagttggaacactaatattaata 
actgtagtttgtagtattatgctagtatttgcagctggagttgcttcaatagttattatt 
atggtaggcgtacttatgaaagatatgttcgaaaagatgaatgtctcaaagtcagtgtta 
tctcgtacacttgaagattcaagtacaatggtattgccactcattccatggggcacatct 

50 ggtatatattatgcacaccaacttaatgtttcagttgatcagttctttatatgggcaatc 
ccatgttacttatgtgcattcattgcaataatttatggctttacaggtataggaattaaa 
aaaataagtagaaaataa 

Sequence 1676 

55 MTQKYRYPTFLESISTILVMVVVVVIGFVFFNVPIQILLLISSAYAALIAHRVGLKWKDL 
EEGITHRLSTAMPAI FIILAVGIIVGSWMYSGTVPALIYYGLKFLNPSYLLVSAFIISAM 
TSIATGTAWGSASTAGIALISIANQLGVPAGMAAGAIIAGAVFGDKMSPLSDTTNLAALV 
TKVNIFAHIKSMMWTTIPASIIGLAIWFIVGLQYKGDANTQQIQNLLKELTTIYNLNFWV 
WIPLIIIVLCLI FRISTVPSMLISSISALVIGTFDHQFNMKDGFKASFDGFNHTMLHQSH 
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ISDNAKTLIEQGGMMSMTQIIVTIFCGYAFAGIVEKAGCLDVILETIAKGVKSVGTLILI 
TWCSIMLVFAAGVASIVIIMVGVLMKDMFEKMNVSKSVLSRTLEDSSTMVLPLIPWGTS 
GI YYAHQLNVSVDQFFIWAIPCYLCAFIAI IYGFTGIGIKKISRK* 

5 Sequence 1677 

Contig_0637_pos_4 306_504 9, 

is similar to (with p-value 3.0e-99} 

>gp:gp(Y16431|SAU16431_6 Staphylococcus aureus dp j , air gene 
s, partial kdpC gene and 4 ORF's. NID: g3850845. 

10 atgggccgaattggaatgaaagatatagatgaatataaagaagtggttgatttaattaat 
aaaagagatcatttagtttttgaaggggttttcacacattttgcgagtgctgatgaacct 
ggaagttctatgaatgaacaatatattttgttcaaagagatggttaatcaagttgagaaa 
ccaatttatattcattgtcaaaattctgctggatcactactcatggatggtcaattttgt 
aatgcaataagattaggaatctctctttatggatactatccttcagaatatgttaaagat 

15 aatgtgaaagttcatttaagaccgagtgcgcagttagtatcagaaaccgttcaagtcaaa 
acgcttaaagttggggaaactgttagttatggacgtacatttattgctgatgaagaaatg 
acaattgcaattttacctattgggtatgccgacggatatttaagatcgatgcaaggtgca 
ttcgtcaatgttaacgggagtcaatgtgaagtcattggacgcatttgtatggaccaaatg 
atagttaaggttccttctcatgtaaaaacgggtgaaaaagtaatacttatggataatcac 

20 gttgattcaccacaatcagctgaagccgtagcaaataaacaaggtacaattaactacgaa 
gtattatgtaatttatcaagacgtcttccaagaatatattattatgataataatgaagag 
gttactaacgaattgttaaaatag 

Sequence 1 678 

25 MGRIGMKDIDEYKEVVDLINKRDHLVFEGVFTHFASADEPGSSMNEQYILFKEMVNQVEK 
PIYIHCQNSAGSLLMDGQFCNAIRLGISLYGYYPSEYVKDNVKVHLRPSAQLVSETVQVK 
TLKVGETVSYGRTFIADEEMTIAILPIGYADGYLRSMQGAFVNVNGSQCEVIGRICMDQM 
IVKVPSHVKTGEKVILMDNHVDSPQSAEAVANKQGTINYEVLCNLSRRLPRI YYYDNNEE 
VTNELLK* 

30 

Sequence 1679 

Contig_0637_pos_1119_7 69, 

putative peptide of unknown function 

atgttaatttatttattaagtttatttactggtatcattggtcctttgattatttggtta 
35 ctcaagcgtaaagagtcccgacttattgatgtatctgggaaaacgtatctgaactatttt 
atttcttatactatctattcaacagtaggcgtgatatgtatgtttatgattgttccttta 
atgaatataagtgaaagtttagccatcttattattaattttgctgctggtggtagtcttc 
atcttattggcattgttaataatgtcatttgtgtgtacaattattgcttgcgtaaaatat 
atgtctggcaaaacttacactatcccactcacgataccttttataaaataa 

40 

Sequence 1680 

MLIYLLSLFTGIIGPLIIWLLKRKESRLIDVSGKTYLNYFISYTIYSTVGVICMFMIVPL 
MNISESLAILLLILLLVVVFILLALLIMSFVCTI IACVKYMSGKTYTIPLTIPFIK* 

45 Sequence 1681 

Cont ig_0637_pos_0_4 4 7 , 

is similar to (with p-value 2.0e-31) 

>gp: gpl U96107 |SCU96107_3 Staphylococcus carnosus N5,N10-meth 
ylenetetrahydromethanopterin reductase homolog, SceB precurs 
50 or (sceB) and putative transmembrane protein genes, complete 
cds, and putative Na+/H+ antiporter NhaC (nhaC) gene, parti 
al cds. NID: g2735503. 

atgaaaaaaatcaaaacaatctcgacattggtagctggacttggtatagcatttctaggt 
cacacaacacatgcagatgcggctgaaaataacaatcaacaacaaagtacatataactat 
55 agtacaactgaagtatcattttctaattcaggaaatttatatacttctggccaatgtact 
tggtatgtttatgataaaactggtggaaaaatcggatcaacatgggggaatgcaaatagc 
tgggcaactgcagctcaagcagcaggattcactgtaaataatacacctgaagaaggtgca 
attatgcaatcatctgaaggtgctttcggacatgttgctttcgttgaaagtgtcaataat 
gatggttctattactgtatcagaaatgaactatgatggtggtccat tcgctataagcaca 
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cgaacaatctctgccagtgaaTATAGT 
Sequence 1682 

MKKIKTISTLVAGLGIAFLGHTTHADAAENNNQQQSTYNYSTTEVSFSNSGNLYTSGQCT 
5 WYVYDKTGGKIGSTWGNANSWATAAQAAGFTVNNTPEEGAIMQSSEGAFGHVAFVESVNN 
DGSITVSEMNYDGGPFAISTRTISASEYS 

Sequence 1683 
Con t ig_0 6 4 0_pos_4 68 3_6 4 94, 
10 is similar to (with p-value 8.0e-81) 

>gp:gp| D88209 | D88209_l Bacillus lichenif ormis DNA for Pz-pep 
tidase, complete cds. NID: gl651215. 

atgagtgaaggtttacctttgagagaagaagttccggtaaaagaaacttgggatttgaaa 
gatttatttacaagtgatcaagcattctatcaaacattggaacaagtagtacaaatgtct 

15 ttagattttaatcatacatattatcagaaacttaataacatagaaacaatagaaaaggca 
ttagatgaatatgaaaggatacttatagaaatagatcgtttatataattatccagaactt 
agattaagcgttgatacgtctaatgaagaagcacaaaaagttaacgcaaaacttaatacg 
acttctggaaaacttgctggtttattatcttttgttgattccgagattttggagttaccc 
gatgagataataagcgaattgaggtctcaaacaaaataccctcattttattaaacaactt 

20 caagatcgtaagccttatcaattatctgctgatgttgaaaaagtattagctacattaaca 
ccaacattgagaagtccgtttgaattgtatggtactacaaagagtttggatattaatttt 
gaatcgtttgattatgagggtgttacctatccattggattatgcaacatttgaaaatgaa 
tatgaagatcatccatctcctgaatttagacgtaaaagttttagagcttttagtgatgca 
ttacgacaa.tatcaacatacgacggccgcaacatataatatgcaagtccaacaagaaaag 

25 attgaagcggatttacgaggatatgattctgttattgattatctactacaagatcaagaa 
gtaacaaaagatatgttcgatagacaaattgatgtcattatgagtgatttagccccagtt 
atgcaaaagtatgcaaaaattattcaacgtgtacataacctggataaaatgcgatttgag 
gatttaaaaatttcaatagaccctaactttgaaccagaaatatcaattgaagaatcgaaa 
aaatacatttatggagcgctcaaagtacttggtgatgattatgtcaaaatgttagagtct 

30 gcctatgattaccgttggattgattttgctcagaataaaggaaaagatactggagcatat 
tgtgcaagtccatacattacacattcatatgtatttatttcatggactgggaaaatggct 
gaaacattcgttcttgcgcatgaattaggacatgcaggtcattttacattagcgcagaat 
catcaaaatttgttggaatctgaagcgtctatgtattttgtagaagcaccttccacaatg 
aatgaaatgttgatggcaaattacttatttaatagtagtaataatcctcgatttaaacgt 

35 tgggttattggttcgattttatctcgaacttattatcataatatggttacccacctttta 
gaagcagcttatcaacgtgaagtgtatagccgagtcgacaatggagagtcattaactgcc 
ccactgctaaatgaaataatgttgaacacttataaagcatttttcggtgacactgttgaa 
atgacagatggggttgaattaacatggatgagacaaccacattattatatgggattgtac 
tcatatacgtactctgctggattgacaattggtacagttgtatcacaatgtatcaagaaa 

40 gaaggtcaacctgctgttgatcgctggttaaaaacgctacaagctggtggtagtcaatct 
ccaattgaattggcgcaaatagctggcgttgatattacgactgacgcccctttaaaagag 
acaattaactatatttcaaatttagtagatgaattagaagtattaacatatcaaataaaa 
gaaaattcataa 

45 Sequence 1684 

MSEGLPLREEVPVKETWDLKDLFTSDQAFYQTLEQVVQMSLDFNHTYYQKLNNIETIEKA 
LDEYERILIEIDRLYNYPELRLSVDTSNEEAQKVNAKLNTTSGKLAGLLSFVDSEILELP 
DE 1 1 SELRSQTKYPH FI KQLQDRKPYQLSADVEKVLATLT PTLRS PFELYGTTKSLDI N F 
ESFDYEGVTYPLDYATFENEYEDHPSPEFRRKSFRAFSDALRQYQHTTAATYNMQVQQEK 

50 IEADLRGYDSVIDYLLQDQF.VTKDMFDRQIDVIMSDLAPVMQKYAKIIQRVHNLDKMRFE 
DLKISIDPNFEPEISIEESKKYIYGALKVLGDDYVKMLESAYDYRWIDFAQNKGKDTGAY 
CASPYITHSYVFISWTGKMAETFVLAHELGHAGHFTLAQNHQNLLESEASMYFVEAPSTM 
NEMLMANYLFNSSNNPRFKRWVIGSILSRTYYHNMVTHLLEAAYQREVYSRVDNGESLTA 
PLLNEIMLNTYKAFFGDTVEMTDGVELTWMRQPHYYMGLYSYTYSAGLTIGTVVSQCIKK 

55 EGQPAVDRWLKTLQAGGSQSPIELAQIAGVDITTDAPLKETINYISNLVDELEVLTYQIK 
ENS* 

Sequence 1685 

Con t ig_0 6 4 0_pos_9 8 7 0_9 1 1 8 , 
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is similar to (with p-value 1.0e-31) 

>sp: sp | P4 6339 I YQGH_BACSU PROBABLE ABC TRANSPORTER PERMEASE P 
ROTE IN IN SODA-COMGA INTERGENIC REGION (ORF72) . >gp:gp|D8443 
2 |BACJH642_159 Bacillus subtilis DNA, 283 Kb region containi 
5 ng skin element. NID: g2627063. >gp : gp | D584 14 | BACPST_2 Bacil 
lus subtilis DNA for homologues of the E. coli pst gene prod 
ucts. NID: g903302. >gp : gp I Z99116 I BSUBO013_208 Bacillus subt 
ilis complete genome (section 13 of 21) : from 2395261 to 261 
3730. NID: g2634723. 

10 gtgccattatctaaattctttttatctggtacttggaacccaactggttcatcaccagaa 
tttggaatttgggcactcatcatcggcacattgaaaattactgttattgctacaatcgta 
gcagttccgattggtttaggagcagctatctaccttaatgaatatgcatctgatcgttca 
cgtagaatcattaaaccaatattagaaattttggctgggattcctacaattgtatttggc 
ttctttgctttgacatttgttacacctatattgagaaatctaattcctaacttgggagag 

15 tttaattcaatcagtcccggtattgttgtgggtataatgattgtccctatgattacaagt 
atgagtgaagatgcaatgtcatctgtacctgataaaattcgtgaaggtgcatttggatta 
ggcgcaactaaatttgaagtcgctacaaaagttgtattaccagctgcgacttctggcgtt 
gttgcttcaattgtattaggtatatcaagagcaataggtgaaacaatgatcgtttcttta 
gctgctggtagttcaccaacatcatctctaagtttaactagttcaattcaaacgatgaca 

20 ggatatattgttgaaattgctacaggtgatgcagcatttggttctgatatttactacagt 
atttacgctgtaggttttacacttttcattttcactttaattatgaatttattatcacaa 
tggatctctaaacgattcagagaggagtattaa 

Sequence 168 6 

25 VPLSKFFLSGTWNPTGSSPEFGIWALIIGTLKITVIATIVAVPIGLGAAIYLNEYASDRS 
RRIIKPILEILAGIPTIVFGFFALTFVTPILRNLIPNLGEFNSISPGIVVGIMIVPMITS 
MSEDAMSSVPDKIREGAFGLGATKFEVATKVVLPAATSGVVASIVLGISRAIGETMIVSL 
AAGSSPTSSLSLTSSIQTMTGYIVEIATGDAAFGSDI YYSI YAVGFTLFIFTLIMNLLSQ 
WISKRFREEY* 

30 

Sequence 1687 

Cont ig_0 6 4 0_pos_9 1 1 6_8 232, 

is similar to (with p-value 2.0e-27) 

>sp:sp| P4 634 0I YQGI_BACSU PROBABLE ABC TRANSPORTER PERMEASE P 
35 ROTEIN IN SODA-COMGA INTERGENIC REGION (ORF73) - >gp:gp|D8443 
2|BACJH64 2_160 Bacillus subtilis DNA, 283 Kb region containi 
ng skin element. NID: g2627063. >gp : gp | D584 14 | BACPST_3 Bacil 
lus subtilis DNA for homologues of the E. coli pst gene prod 
ucts. NID: g903302. >gp : gp I Z99116 I BSUB0013_207 Bacillus subt 
40 ilis complete genome (section 13 of 21) : from 2395261 to 261 
3730. NID: g2634723. 

atgtctacacattcaaatactgctaacaaaacattgattgataaagatgccgtagaaaaa 
aaaatttcttctcgtgataggaaaaactcggtaaacaaatggttatttttattatgtaca 
ttaattgggctcattgttttagtagcactattaattcaaactttcgttaaaggggcggga 

45 catctaactcccgaatttttcactaatttttcatcttcaacaccagcagatgctggtatt 
aaaggggctttagtaggttctatttggttaatcttaagtattattccaattagtatcatt 
ttaggaataggtacagcaatttatttagaagaatacgcaagagacaatatttttacacaa 
atcgtaaaggtgagtatatctaatttagctggtgttccttcaattgttttcggtttacta 
ggttatacattatttgtaggcgcggcaggtttaggtaatagcgtgctagccgctgcgctt 

50 acaatgtcactactaatcttgcctgttattatcgttgctagtcaggaagctatcagagca 
gttcctagttcagtcagagaagcatcatatggtcttggtgctaataaatggcaaacaatt 
agaagagttgttttacctgcagcattacctggtattttaacaggtt tcattttatcttta 
tcacgcgcattaggagaaacagcaccacttgtaatgataggtatacctacgatactttta 
gcaacaccaagtggattactcgaccaatttctctgcgttaccaactcaaatttatacatg 

55 ggcaaaaatgcctcaagcagaattccaaaacgttgcatcagcaggtattatcgttctact 
cgttatcttattattgatgaacactgtagcgatacttcttcgtaa 

Sequence 1688 

MSTHSNTANKTLIDKDAVEKKISSRDRKNSVNKWLFLLCTLIGLI VLVALLIQTFVKGAG 



425 



WO 01/34809 



PCT/USOO/30782 



HLTPEFFTNFSSSTPADAGIKGALVGSIWLILSI I PISI ILGIGTAIYLEEYARDNIFTQ 
IVKVSISNliAGVPSIVFGLLGYTLFVGAAGLGNSVLAAALTMSLLILPVIIVASQEAIRA 
VPSSVREASYGLGANE<WQTIRRVVLPAALPGILTGFILSLSRALGETAPLVMIGIPTILL 
ATPSGLLDQFLCVTNSNLYMGKNASSRIPKRCISRYYRSTRYLIIDEHCSDTSS* 

5 

Sequence 1689 

Contig_064 0_pos_8117_724 2, 

is similar to (with p-value 7.0e-95) 

>sp: sp | P4 634 2 I YQGK_BACSU HYPOTHETICAL ABC TRANSPORTER ATP-BI 

10 NDING PROTEIN IN SODA-COMGA INTERGENIC REGION (ORF75) . >gp:q 
p|D84 4 32|BACJH64 2_162 Bacillus subtilis DNA, 283 Kb region c 
ontaining skin element. NID: g2627063. >gp:gp | D58414 | BACPST_ 
5 Bacillus subtilis DNA for homologues of the E. coli pst ge 
ne products. NID: g903302. >gp: gp I Z99116 j BSUB0013_205 Bacill 

15 us subtilis complete genome (section 13 of 21) : from 2395261 
to 2613730. NID: g2634723. 
atggctaattcacaagtagcagaaaaagagaaactagacgcacaaacaaataatcaagac 
tcagttgccacaatagtaactactgaaaacaataagaaaaatacaattccagacagtgaa 
aagaagattgtttattcaactcaaaatctagatttatggtatggagaaaatcatgcgtta 

20 caaaacattaatttagatatattggaaaataatgtaactgcaataatcggaccttctgga 
tgtggtaaatctacatacatcaaagctttaaatagaatggtcgaattagttccatctgtg 
aaaactgcaggtaaaattttgtatcgtgaccaaaatatatttgatgcaaagtattctaaa 
gaaaagctacgtactaacgttggaatggtctttcaacaacctaacccattccctaagtca 
atttatgataatattacttatggccctaagactcacggtattaaaaacaaaaaaattcta 

25 gatgaaatcgtagaaaaatcattacgtggcgctgcaatatgggatgaattaaaagataga 
ttgcatacaaatgcttatggattatcaggtggacaacaacaacgtgtttgtatagctaga 
tgtttagcaattgaaccagatgtcattttaatggatgaacctacatcagcattagatcct 
atttctacgttaagagttgaagaacttgtacaagaattaaaagaaaat tactcaattatc 
atggttacgcacaacatgcaacaagctgcgcgtgtttcagataaaactgctttcttctta 

30 aatggatatgtcaatgaatatgatgatactgataaaatcttttcaaatcctgccgacaaa 
caaactgaagattatatatctggtcgttttggataa 

Sequence 1690 

MANSQVAEKEKLDAQTNNQDSVATIVTTENNKKNTIPDSEKKIVYSTQNLDLWYGENHAL 
35 QNINLDILENNVTAI IGPSGCGKSTYIKALNRMVELVPSVKTAGKILYRDQNIFDAKYSK 
EKLRTNVGMVFQQPNPFPKSIYDNITYGPKTHGIKNKKILDEIVEKSLRGAAIWDELKDR 
LHTNAYGLSGGQQQRVCIARCLAIEPDVILMDEPTSALDPISTLRVEELVQELKENYSII 
MVTHNMQQAARVS DKTA FFLNG Y VNE Y DDTDKI FSN PADKQTEDY I SGRFG * 

40 Sequence 1691 

Contig_0640_pos_4 420_4100 f 

putative peptide of unknown function 

atgccaaactatttgtggattaccattcttggtatgattttattaactgtattttacaca 
cttgtattaaataaatggttccagtctgcaatcattacttttgtagt tttagcagtactt 
45 gccttttttataccaaattttcaaaacatttcatatcaaccactgcttggatatgcagga 
ttcttaggcataatgagcttaatcataagctttcttatttggtatttttctagaaactgg 
agaaaaaatcgtagaaaaataaaattggaaaaagagattcgcaaatatgatgatgaagag 
tcacttcgtcgtcataaataa 

50 Sequence 1692 

MPNYLWITILGMILLTVFYTLVLNKWFQSAIITFVVLAVLAFFIPNFQNISYQPLLGYAG 
FLGIMSLIISFLIWYFSRNWRKNRRKIKLEKEIRKYDDEESLRRHK* 

Sequence 1693 
55 Cont i g_0 6 4 0_pos_3 93 9 2 92 0 , 

is similar to (with p-value 0.0e+00) 

>gp: gp j AF076684 | AF076684_1 Staphylococcus aureus oligopeptid 
e transporter putative membrane permease domain (opp-2B) , ol 
igopeptide transporter putative membrane permease domain (op 
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p-2C), oligopeptide transporter putative ATPase domain (opp- 
2D) , and oligopeptide transporter putative ATPase domain (op 
p-2F) genes, complete cds . NID: g3800824. 

gtgaaaggatgccaacatatgtttaaaatgataatttataaactttcacaaatgattgtc 
5 gtactatttatattaactacaatcacatttatattaatgaaactctctccaggtaatcct 
gtagacaaaattttacatcttgatatttcgcatgtatctaatgagcaaatagaaacgaca 
gagaataagcttggcttaaataatcctatttttattcaatggtgggactggttaaatcaa 
ttgtttcattttgatttaggaacaagttatcaaacaagcgagcctgtaattagggaaata 
gcaaattatcttggtcctacacttattattacttttggtacgcttatagtgtcattagtt 

10 atttctataccattagggattatagcagcggtttactaccataaaatttgggataggata 
atccgtgttatgacatcattatccgtaagcctaccatcattttttatcggtcttatctta 
ttatatatatttagcttgaagttgaatattttaccaacttcagatgaggggcgtttcgtt 
tcatatattttaccaataattaccatgagtattggaatgtgtgcttattatattcgattt 
attcgttctactttattagaacaatatcaaacacctatagttgaatcgtctcgtctcaga 

15 ggtatgcccgaaagatatatactttttcaagatatccttaaacctacgatactaccaatc 
atacctctattaggattatccattggtagtttgataggtggaacagtagtcattgaaaat 
ttatttgatattcctgggttaggctattttttagttgacagtataaagtcgagagattat 
ccagtcattcaaggttgtgtattatttattggtttctttgtagtgattataaacacaatt 
gcagatttactttcattacttatcgatcctaaacaacgttatgctattactcagaaagaa 

20 acatcaaagtttaaatggtttaattcacatagaaaagaaggtcgtaacgatgaagtttaa 



Sequence 1694 

VKGCQHMFKMIIYKLSQMIVVLFILTTITFILMKLSPGNPVDKILHLDISHVSNEQIETT 
25 ENKLGLNNPIFIQWWDWLNQLFHFDLGTSYQTSEPVIREIANYLGPTLIITrGTLIVSLV 
ISIPLGI IAAVYYHKIWDRIIRVMTSLSVSLPSFFIGLILLYIFSLKLNILPTSDEGRFV 
SYILPI ITMSIGMCAYYIRFIRSTLLEQYQTPIVESSRLRGMPERYILFQDILKPTILPI 
IPLLGLSIGSLIGGTVVIENLFDIPGLGYFLVDSIKSRDYPVIQGCVLFIGFFVVI INTI 
ADLLSLLI DPKQRYAI TQKETSKFKWFNSHRKEGRNDEV* 

30 

Sequence 1695 

Cont ig_0 64 0_pos_27 95_2 07 6 , 

is similar to (with p-value 2.0e-83) 

>gp: gp| AF076684 | AF076684_2 Staphylococcus aureus oligopeptid 

35 e transporter putative membrane permease domain (opp-2B) , ol 
igopeptide transporter putative membrane permease domain (op 
p-2C) , oligopeptide transporter putative ATPase domain (opp- 
2D) , and oligopeptide transporter putative ATPase domain (op 
p-2F) genes, complete cds. NID: g3800824. 

40 atgagtatgaatcatttacttggcacagatgattacggacgtgatttattcagtcgttta 
gtcgtgggctctcgtgcaacattgtttgttacactacttactttacttttcactgtagtg 
gttggagtacctttagggttacttgcaggctataaaaaaggttggattgatacgattatc 
atgcgaattattgatataggattaagcataccagaattcgttattatgattgccttagca 
agtttttttcatcctagtctttggaatttagtaatagctattacaatcataaaatggatg 

45 aattatactcgcgtgacaagagggattgtcaataccgaaatgaatcaatcgtatatacag 
atggcacaattttttaatgtctcaactttgaatatcttatttaaacacttattaccaaaa 
gttttaccatctatatttgttattatgatagttgattttggaaaaatcattttatacatt 
agttcattatcatttttaggtttaggtgcacaaccaccatctccagagtggggggcaatg 
ttacaagcagggcgtgaatttattacttcacatcctatcatgattatcgctccagcatct 

50 ttgatatcaggtacaatattgatatttaatttaactggtgatgctgtaagagatcgttta 
ttagaacaaagaggtgtaaaagttgaaacttttaacaataaaaaatctaaacatcaatga 



Sequence 1696 

55 MSMNHLLGTDDYGRDLFSRLVVGSRATLFVTLLTLLFTVVVGVPLGLLAGYKKGWIDTII 
MRIIDIGLSIPEFVIMIALASFFHPSLWNLVIAITIIKWMNYTRVTRGIVNTEMNQSYIQ 
MAQPFNVSTLNILFKHLLPKVLPSIFVIMIVDFGKIILYISSLSFLGLGAQPPSPEWGAM 
LQAGREFITSHPIMIIAPASLISGTILIFNLTGDAVRDRLLEQRGVKVETFNNKKSKHQ* 
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Sequence 1697 

Contig_064 0_pos_1936_1337, 
is similar to (with p-value 1.0e-64) 
5 >gp:gp| AF076684 | AF076684_3 Staphylococcus aureus oligopeptid 
e transporter putative membrane permease domain (opp-2B) , ol 
igopeptide transporter putative membrane permease domain (op 
p-2C), oligopeptide transporter putative ATPase domain (opp- 
2D) , and oligopeptide transporter putative ATPase domain (op 

10 p-2F) genes, complete cds . NID: g3800824 . 

atgagtttcgatgaatttaaaatgcaaggtcaaaatacttctggtatcaagcaactttta 
ggtaaacatatcggctatatctctcaaaattatgctcaaagttttaatgaatatactcgt 
ttggataaacaacttatagctatatatcgttatcattttaatgtttctaaggataatgca 
ttgaaaaagataaaaaaagctttaacttgggttaacttaaatgatgaatcaatcattaat 

15 aaatatagtttccaactttcaggaggacaattagagcgagttaatattgctagcgtttta 
atgttagatccagaattaattattgcagatgaacctgttgcatctttagatgtagtgaac 
ggtcatcaaataatgcaactccttcaacacattgttaaagatcatcataatactgtatta 
cttatcactcataacatgaatcatgtcctcaaatatgctgattattttaatgtaatgaga 
aatggcatgatgattgaatctggagaaatagacaaattatttaatcaccatcatcttcat 

20 cggtatacagaacaattattaaactatagaagcaagctgcaaaaggaggacaacatctaa 



Sequence 1698 

MSFDEFKMQGQNTSGIKQLLGKHIGYISQNYAQSFNEYTRLDKQLIAI YRYHFNVSKDNA 
25 LKKIKKALTWVNLNDESIINKYSFQLSGGQLERVNIASVLMLDPELI IADEPVASLDVVN 
GHQIMQLLQHIVKDHHNTVLLITHNMNHVLKYADYFNVMRNGMMIESGEIDKLFNHHHLM 
RYTEQLLN YRSKLQKEDNI * 

Sequence 1699 
30 Contig_064 0_pos_1334_636, 

is similar to (with p-value 9.0e-72) 

>gp: gp ! AF07 6684 | AF076684_4 Staphylococcus aureus oligopeptid 
e transporter putative membrane permease domain (opp-2B) , ol 
igopeptide transporter putative membrane permease domain (op 
35 p-2C), oligopeptide transporter putative ATPase. domain (opp- 
2D) , and oligopeptide transporter putative ATPase domain (op 
p-2F) genes, complete cds. NID: g3800824. 

atgattcaatttgatcatgtagattattcatatcatcgaaaacagcctgttttaaaagat 
attaatataagtattcaacgtggtgaaaaaataggggttttaggggaaagcggtgctgga 

40 aaaagtactattggttctttaatattaggtcaattaaagccaacaaaaggaaaaataagt 
atcgattcaggaaaggttctacctatttttcaacatgcgacagaaagttttgatcgtcaa 
ttcacgattgaacagtctttgagagagccacttttattttatcgacaattaatacgacaa 
aatatcaaaaatatcattcttaactatttaattgaatttaatttgtctacagatctaata 
acaaagtttcctcaagaggtaagtggtgggcaactacaaagattaaatattatacgttct 

45 ctcttagcacaaccagatatattggtttgtgatgaaataacttcgaacttagacgtcatg 
gccgaacaaaatgtaatcaatattttacttaacgaaaaaaacattcaaaataaaacacta 
atcgtcatctcgcatgatttatctgttttacaaaggttaacgaataggataatagttatc 
aaagacggtcaaatagtagatgattttaaaagtaaagatttatttagccataaaagacat 
ccatatacaaaactattaattcaaacgtatgaatattga 

50 

Sequence 1700 

MIQFDHVDYSYHRKQPVLKDINISIQRGEKIGVLGESGAGKSTIGSLILGQLKPTKGKIS 
IDSGKVLPIFQHATESFDRQFTIEQSLREPLLFYRQLIRQNIKNIILNYLIEFNLSTDLI 
TKFPQEVSGGQLQRLNIIRSLLAQPDILVCDEITSNLDVMAEQNVINILLNEKNIQNKTL 
55 IVISHDLSVLQRLTNRI IVIKDGQIVDDFKSKDLFSHKRHPYTKLLIQTYEY* 

Sequence 1701 

Contig_0641_pos_551_1597 / 

is similar to (with p-value 1.0e-34) 
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>sp:sp|P09122|DP3X_BACSU DNA POLYMERASE III SUBUNITS GAMMA A 
ND TAU (EC 2.7.7.7) . 

atggatcaagcaatagcgtttggagacgaacgacttactttacaagatgctttaaatgtt 
acaggtagtgttgatgaagcggcattaaatgagttatttaatgacattgtaaaaagtgat 
5 gttaaagccgcatttaatagatatcatcattttatttcagaaggtaaagaagtcaacaga 
ctcattaatgatatgatttactttgttagagatacaattatgaataaaacgtctaacgaa 
tccgttcattttgaatcacttattcatttcgacttagatatgttatacaggatgatagat 
atcatcaatgatacactagtatccattaggttcagtgtaaatcaaagtgttcattttgaa 
gtgttgctagttaaacttgcagaaatgattaagacacagcctcaaactgtacaaaatgta 

10 gcaacagcatcggtagctaatgaaccagataatgagatgttattacaacgtttagaacaa 
cttgaaaatgagcttaaaaccttaaaagaacaagggatcaaaactaataaagttagtcaa 
caacctaagaaaccaacacgtacgattcaacgatctaaaaatacgttttctatgcaacaa 
atagcgaaagtattagacaaagcaaacaaagatgatatcaaattgttgaagaaccattgg 
caagaagtgattgatcatgcaaaaagtaatgataaaaagtctttagtaagtttgctactg 

15 aattcagaaccagtagcagctagtgaagatcatgtgttagttaaatttgatgaagaaatt 
cattgtgaaatagtaaataaagatgatgaaaagagaaacaatattgaaagtgtagtttgt 
aatatagttaataaaactgtcaaagtagttggagtgccggctgaccaatggctgagagtg 
agagcagagtacttacaaaatcgtaacaccaatgaaacacatcaaagcgaaaaacaaagc 
acacaacagtctcaacaaatagatattgctcaaaaagctaaagacttatttggtgaggaa 

20 actgtacacttagttgatgaagactga 

Sequence 1702 

MDQAIAFGDERLTLQDALNVTGSVDEAALNELFNDIVKSDVKAAFNRYHHFISEGKEVNR 
LINDMI YFVRDTIMNKTSNESVHFESLIHFDLDMLYRMIDIINDTLVSIRFSVNQSVHFE 
25 VLLVKLAEMIKTQPQTVQNVATASVANEPDNEMLLQRLEQLENELKTLKEQGIKTNKVSQ 
QPKKPTRTIQRSKNTFSMQQIAKVLDKANKDDIKLLKNHWQEVIDHAKSNDKKSLVSLLL 
NSEPVAASEDHVLVKFDEEIHCEIVNKDDEKRNNIESVVCNI VNKTVKVVGVPADQWLRV 
RAEYLQNRNTNETHQSEKQSTQQSQQIDIAQKAKDLFGEETVHLVDED* 

30 Sequence 1703 

Contig_0641_pos_2010_2606, 

is similar to (with p-value 3.0e-87) 

>sp:sp! P24277 |RECR_BACSU RECOMBINATION PROTEIN RECR. >gp:gpl 
D26185|BAC180K_85 B. subtilis DNA, 180 kilobase region of re 
35 plication origin. NID: g467326. >gp : gp | XI 7014 | BSRECM_3 Bacil 
lus subtilis dnaZX and recR genes and two unidentified readi 
ng frames. NID: g453238. >gp: gp | Z99104 | BSUB0001_21 Bacillus 
subtilis complete genome (section 1 of 21) : from 1 to 213080 
. NID: g2632267. 

40 atgcattatccagaacctatatcaaagcttatcgatagttttatgaaactgccaggcatt 
ggaccaaagacggctcaacgtctggcttttcatactttagatatgaaagaagacgatgtt 
gttaagtttgctaaagcactagttgatgttaaaagagaacttacctattgtagtgtttgt 
gggcatattacagaaaatgatccttgttatatatgtgaagataaacagcgagatcgttct 
gtcatatgtgtagttgaagatgacaaggatgtcatagcaatggaaaaaatgcgtgaatat 

45 aaaggtttatatcacgtgcttcatggttcgatttcaccaatggatggtattgggcctgaa 
gacatcaatatacctgcattagttgaacgcctcaaaaacgatgaggtgaaagagcttata 
ttagctatgaatcctaacctagaaggcgagtctactgcaatgtatatatctaggttggtt 
aaaccaattgggattaaagtcacaagactggcacaaggtttatctgtaggcggcgattta 
gaatatgctgatgaagtgactttatctaaagcaattgcaggtagaacggaaatgtaa 

50 

Sequence 1704 

MHYPEPISKLIDSFMKLPGIGPKTAQRLAFHTLDMKEDDVVKFAKALVDVKRELTYCSVC 
GHITENDPCYICEDKQRDRSVICWEDDKDVIAMEKMREYKGLYHVLHGSISPMDGIGPE 
DINIPALVERLKNDEVKELILAMNPNLEGESTAMYISRLVKPIGIKVTRLAQGLSVGGDL 
55 EYADEVTLSKAIAGRTEM* 

Sequence 1705 

Contig_0641_pos_4 554_4 898, 

putative peptide of unknown function 
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gtgacaaaccggaggaaggtggggatgacgtcaaa teat cat gcccct tat gat ttgggc 
tacacacgtgctacaatggacaatacaaagggcagcgaaactgcgaggtcaagcaaatcc 
cataaagttgttctcagttcggattgtagtctgcaactcgactatatgaagctggaatcg 
ctagtaatcgtagatcagcatgctacggtgaatacgttcccgggtcttgtacacaccgcc 
5 cgtcacaccacgagagtttgtaacacccgaagccggtggagtaaccatttggagctagcc 
gtcgaaggtgggacaaatgattggggtgaagtcgtaacaaggtag 

Sequence 1706 

VTNRRKVGMTSNHHAPYDLGYTRATMDNTKGSETARSSKSHKVVLSSDCSLQLDYMKLES 
10 LVIVDQHATVNTFPGLVHTARHTTRVCNTRSRWSNHLELAVEGGTNDWGEVVTR* 

Sequence 1707 

Con t i g_0 6 4 l_po s_3 0 4 8 _2 7 3 7 , 

putative peptide of unknown function 

15 gtgttatacagaaaaatggttgctaaaataaaacttaaacgcctatacaaac.cctattca 
caaactttatttttgttgttcattttaaaatatttttactattttactataatccttgtt 
ttttattccaaatactttacatcatccttgggtaaagaaggattgattcttatcctcatt 
aataataaatgtaattataaaagcctttccgtgaactcaataagtctgaattctaaaaag 
cgaaacagaaatcttatcatattcttctttgtttcaatccattaccgaaccccaacttgc 

20 tttgtctgttga 

Sequence 1708 

VLYRKMVAKIKLKRLYKPYSQTLFLLFILKYFYYFTIILVFYSKYFTSSLGKEGLILILI 
NNKCNYKSLSVNSISLNSKKRNRNLIIFFFVSIHYRTPTCFVC* 

25 

Sequence 1709 

Contig_064 2_pos_3898_4 338, 

putative peptide of unknown function 

atgggtgaagttaaacatttacaaatcaattacaaaacagatgagcttttcgctgatttc 
30 agagaatttggtaacaagaacctatatatgattgaggagttaaagggacaaatgattgat 
gcaagttccgattctcctttttatggaatttttgttggtaataaattagttgcgagaatg 
gcattacttgataaaggagaagtagaagaaacctatttccctaatagcaatgactatatt 
cttctttggaaattagaagtattagatacgtaccaaagacgtggatatgctaagcaacta 
ttaaactttgctaaagaaaataaaaaaccaattaaagctattgctagaaataactcaaaa 
35 gaattctttttaaaacagggttttaaagatgtagaaaccaaaaatcctgaggggcacgac 
atcttaatatggaatccataa 

Sequence 1710 

MGEVKHLQINYKTDELFADFREFGNKNLYMIEELKGQMIDASSDSPFYGIFVGNKLVARM 
40 ALLDKGEVEETYFPNSNDYILLWKLEVLDTYQRRGYAKQLLNFAKENKKPIKAIARNNSK 
EFFLKQGFKDVETKNPEGHDILIWNP* 

Sequence 1711 

Contig_0642_pos_3719_2106, 

45 is similar to (with p-value 9.0e-71) 

>sp: sp| P5534 2 | YLLA_BACSU HYPOTHETICAL 62.6 KD PROTEIN IN FTS 
L 5' REGION. >gp:gp|Z68230|BSYLLSPO_l B.subtilis yllA, yllB, 
yllC, ftsL, pbpB and spoVD genes. NID: gll22757. 
atgaagtgtaatacgttaaagttaactgaacaggatcagttcattaataaaataaaaaat 

50 agtgaatctcaaattacatctttttatgaatatgatgcagcgaaaaaggaaagcttttat 
agaagattaaaaacacctaataatggaagggaatttcatttatcacgagtgattaaatct 
tacatgaatgaattaaagttaacacatcagcagctgaataacatagatgctttagctgat 
ggtgcaaaagtagtgattggtggacaacaagctggtctttttgggggtccactgtatacg 
tttcataagattttctcaattataactttaagtcgtcaactttctgaggaatatgatact 

55 cctattgtacctgtgttttggattgcaggtgaggaccatgattttgaagaggtgaatcat 
acgtatgcattcaataataaagaaactaccttgaaaaaagtaaagtatcatacgatgaca 
ccaccagatagtaatgtttcaagatatactcctgataagaacgaattaaaagcatcatta 
aatcacttttttaaagaaatgaaggaaactgtacatactcaagatgtttatcaaatgtgc 
gtcaatattattaatcaatttgattcatggattgatatttttaagggattgatacatgaa 
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gtgtttaaggattatgggattttacttattgatgctcaataccctgaattaagacagatg 
gagaaaccgttgtttaaagagatattagaaaagaggaatcaagtcgatcaatcttttcgt 
gaaactcagatacgaacaactcaacaacaacttccatcaatgatacaaacagagacaaac 
acacatttatttatccatgaagacggaatgagacagcttttaaattttgatggcacttat 
5 tttaaactgaataaaactgagaaacgttacacgaaacaaaatttattagatattatagaa 
agagagcctgaaagaatttctaataatgttgttactcgtccagttgtagaggaatggttg 
tttaacacagtagcatttatcggaggtccaagcgaaatcaaatattgggcagaat taaaa 
ggtgtttttgatacgttaaatgtagaaatgcctattgttatgccaagattaagaatcacg 
tatttgtatgctagaactaaaaagttattaaaacaatataatttatcgatagagtctgtc 

10 attgctaatggagtagaacaggaacgtcaacgttttgttcgtgaaaaagcatcaaataat 
tttataaatgaagtagaagaaatgaaaattcagcaacaagaactttataacaatttattc 
acct atgt ggaaaa taat cat gacaaccaacttcttttagaaaaaaataatcaaat teat 
ctcaatcagtacgattatttaatcaaacggtacttactgaatattgaaagagaaaatgat 
ataagtatgcgacagtttagagaaattagtgaaacactccatccaatgggtggtctacaa 

15 gaaagagtttggaatccacttcaaattatgaatgattttgggatagatgtgttcagtccc 
tccacctatccaccactttcttactcgtttgatcatttgattataaatccttga 

Sequence 1712 

MKCNTLKLTEQDQFINKIKNSESQITSFYEYDAAKKESFYRRLKTPNNGREFHLSRVIKS 
20 YMNELKLTHQQLNNIDALADGAKVVIGGQQAGLFGGPLYTFHKIFSIITLSRQLSEEYDT 
PIVPVFWIAGEDHDFEEVNHTYAFNNKETTLKKVKYHTMTPPDSNVSRYTPDKNELKASL 
NH FFKEMKETVHTQDV YQMCVN I 1 NQFDSW I DI FKGLI H EV FKD YG I LL I DAQ Y PELRQM 
EKPLFKEILEKRNQVDQSFRETQIRTTQQQLPSMIQTETNTHLFIHEDGMRQLLNFDGTY 
FKLNKTEKRYTKQNLLDIIEREPERISNNVVTRPVVEEWLFNTVAFIGGPSEIKYWAELK 
25 GVFDTLNVEMPIVMPRLRITYLYARTKKLLKQYNLSIESVIANGVEQERQRFVREKASNN 
FINEVEEMKIQQQELYNNLFTYVENNHDNQLLLEKNNQIHLNQYDYLIKRYLLNIEREND 
ISMRQFREISETLHPMGGLQERVWNPLQIMNDFGIDVFSPSTYPPLSYSFDHLIINP* 

Sequence 1713 
30 Contig_0642_pos_1962_1531, 

is similar to (with p-value 8.0e-70) 

>sp: sp 1 007319 I YLLB_STAAU HYPOTHETICAL 17.4 KD PROTEIN. >gp : g 
p|U94706|SAU94706_l Staphylococcus aureus strain ATCC 8325-4 
cell wall/cell division gene cluster, yllB, yllC, yllD, pbp 
35 A, mraY, murD, divlB, ftsA and ftsZ genes, complete cds . NID 
: g2149889. 

atgt teat gggagaattcgat cat caattggatacaaaaggacgtatgattataccgtcc 
aaatttcgttatgatctaaatgaacgttttattatcacaagaggccttgataaatgttta 
tttggttacactctagaagagtggcagcaaattgaagagaagatgaaaaccttacctatg 
40 acaaaaaaagacgcgcgtaaatttatgcgtatgttcttctcaggtgctgtagaagtagaa 
ttagataaacaagggcgtattaatattccgcaaaatttaagaaaatatgccaatttaagt 
aaggaatgtacagtaattggtgtctcaaatcgtatagagatttgggacagagaaacttgg 
aatgatttctatgatgaatctgaagaaagtttcgaagacattgctgaagatttaatagat 
tttgatttttaa 

45 

Sequence 1714 

MFMGEFDHQLDTKGRMIIPSKFRYDLNERFIITRGLDKCLFGYTLEEWQQIEEKMKTLPM 
TKKDARKFMRMFFSGAVEVELDKQGRI NI PQNLRKYANLSKECTVI GVSNRI EIWDRETW 
NDFYDESEESFEDIAEDLIDFDF* 

50 

Sequence 1715 

Cont ig_064 2_pos_l 4 95_58 1 , 

is similar to (with p-value 0.0e+00) 

>gp:gp|U94706|SAU94706_2 Staphylococcus aureus strain ATCC 8 
55 325-4 cell wall/cell division gene cluster, yllB, yllC, yllD 
, pbpA, mraY, murD, divlB, ftsA and ftsZ genes, complete cds 
. NID: g2149889. 

atgttaaacgaaaccattgattatttaaatattaaagaagatggtgtgtatgttgactgt 
acgttgggtggagcaggacatgccctctatttacttaatcaattaaatgataaaggtaga 
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cttattgcgattgatcaagatttaacagccatagaaaatgcgaaagaagttttaaaagaa 
catttgcacaaagtcacttttgttcataacaactttcgagaattaacaaatattttaaat 
gaattagaaattgaaaaagtagatggtatttattatgacttaggtgtttcaagcccgcaa 
ttggatgtgcctgaaagaggctttagttatcacaatgatgcgaaactagatatgcgaatg 
5 gatcaaacacaatcactttctgcgtatgaagtagttaatcaatggtcttatgaagcatta 
gttaggattttctttcgttacggtgaagagaaattttctaaacaaattgcacgcagaatt 
gaagcccatcgagaacaacaacctatagaaacaactttagaactagttgatgtcattaaa 
gaaggcataccagcgaaagcaagacgaaaagggggacatcctgcgaaacgcgtgttccaa 
gctattcgaattgctgtgaatgatgagttatcagcttttgaagattcagttgagcaagcc 
10 attgaatgtgtgaaggtcggaggtagaatttcagttattactttccactctttggaagat 
cgtttgtgtaaacaaattttccaagagtttgagaaaggtccagacgtaccaagaggtctc 
cccgttattcctgaagcatatacacctaagttaaaacgagtaaatcgtaaaccgattacc 
gctactgatgacgatttaaacgaaaacaatcgagcacgtagcgccaagttacgcgtagca 
gaaatattaaaataa 

15 

Sequence 1716 

MLNETIDYLNIKEDGVYVDCTLGGAGHALYLLNQLNDKGRLIAIDQDLTAIENAKEVLKE 
HLHKVTFVHNNFRELTNILNELEIEKVDGIYYDLGVSSPQLDVPERGFSYHNDAKLDMRM 
DQTQSLSAYEVVNQWSYEALVRIFFRYGEEKFSKQIARRIEAHREQQPIETTLELVDVIK 
20 EGI PAKARRKGGH PAKRVFQAI RIAVNDELSAFEDSVEQAIECVKVGGRISVITFHSLED 
. RLCKQIFQEFEKGPDVPRGLPVIPEAYTPKLKRVNRKPITATDDDLNENNRARSAKLRVA 
EILK* 

Sequence 1717 
25 Contig_064 5_pos_1187_1612, 

putative peptide of unknown function 

atgactaatcaaaaccaaattgaacaacgattctataatattattgaaaatgctgatcaa 
tattatctctacttatatgattggcgtgtgattgaaagtgaatctaatgagagcgacttt 
aaagacgttgatatatggattaactttgaagatgaagcgttaatagatgaatatat ttgt 
30 gtgattgctaaagttgatgatgaaggtggcaatattaatcatatgatttctcaaaactta 
cgtcacaaatatgtttggtctacaccgttcttgatgagagttgaggagaactttagacca 
tacatgcacattatggaacatgactttaaaaagggcccattgaaaatcagaatgaaatat 
aatgcagtatatttagttgaaatatataaaaaagataaaataaataaaaggcgtagcaca 
acttaa 

35 

Sequence 1718 

MTNQNQIEQRFYNIIENADQYYLYLYDWRVIESESNESDFKDVDIWINFEDEALI DEYIC 
VIAKVDDEGGNINHMISQNLRHKYVWSTPFLMRVEENFRPYMHIMEHDFKKGPLKIRMKY 
NAVYLVEIYKKDKINKRRSTT* 

40 

Sequence 1719 

Contig_064 5_pos_3058_3531, 

putative peptide of unknown function 

atgactgctaatgattggatagaccggttggaattaatttcgcatcctgaaggcggatat 
45 tttaaagaaacaatgagaggcgacggtaaaggaagagcgtcttttagcagtatttatttt 
ctattaacacagcgcgatatatcacattttcatagaatagatgcagatgaagtatggtat 
tatcatgctggtcagacgttgaagattcatatgataacccctaaaggtgaatatcatact 
gttaaattaggtagagatatagattgtggtgagtgtttacaatattgtgtgccaaaaggg 
acaatttttgcttctactttagatagtgcagagggatatagtctagttggatgtatgtgt 
50 caacctggatttgaatacgagcattttgaacttttaacacaagaatatttgattcgtcaa 
tatccacaatatgaaagcataataaaaagattagctatatctcaagaagattaa 

Sequence 1720 

MTAN DWI DRLEL I SH PEGGY FKETMRGDGKGRAS FSS I YFLLTQRDI S H FH RI DADEVW Y 
55 YHAGQTLKIHMITPKGEYHTVKLGRDIDCGECLQYCVPKGTIFASTLDSAEGYSLVGCMC 
QPGFEYEHFELLTQEYLIRQYPQYESI IKRLAISQED* 

Sequence 1721 

Con t i g_0 6 4 5_pos_6 2 6 2_6 8 6 1 , 
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putative peptide of unknown function 

gtgactgtgaaagtaaaatatatagacaaacgtcactggcgtcgcctggtagagagagaa 
tatacagaggttaaagttaataataatagatttaaaggtattataggcttggtcacgatg 
aaaaaggttcgtgagcctttagaggtgacggtagttggacaaaacataatagttgcagat 
5 gacaattataaatggttacagatactaccggataagaagcggtatagtatgactgtcatg 
t ttgat aat aaagggaa tccattagaat act actttgacattaatataaagaacat tact 
caaaaaggaaatgcacgtaccatcgatttatgtttagacgttctcgttctacctaatggt 
gagtatgaacttgtagatgaagatgatttaatgtacgcactacaaaataaacaaatttca 
aagaagcaatatcatgaggcatatattatcgcccatcaattaatgattgaaatagaagat 
10 aatttttcagaaatacaagataaggttatgcgttgttatcataaaatcaatcataaagca 
cagaaaatgaaacataaacgtccctataaagctaaaaagaaatcacaccgacgacattaa 

Sequence 1722 

15 VTVKVKYIDKRHWRRLVEREYTEVKVNNNRFKGI IGLVTMKKVREPLEVTVVGQNII VAD 
DNYKWLQILPDKKRYSMTVMFDNKGNPLEYYFDINIKNITQKGNARTIDLCLDVLVLPNG 
EYELVDEDDLMYALQNKQISKKQYHEAYI IAHQLMIEIEDNFSEIQDKVMRCYHKINHKA 
QKMKHKRPYKAKKKSHRRH* 

20 Sequence 1723 

Cont i g_0 6 4 5_po s_7 1 1 _3 8 5 , 

putative peptide of unknown function 

gtgaaagatgatttaaacgatgattttgaagattctttagagtatttggagccattagat 
catgatgcatatattgtgaggttaaactttactggtgaaaatacgactgagcctatcata 
25 tcttatatgacgacgacgcataacatagatgtgaatattcttgaagcagatattaagaat 
acta'aaaacggttcgtttggatttttagttattcacataccacatataagtgaagaacat 
ttcaagcaatttaaacataatcttcacacaaaagctaatctttt tagtaggt atggctgg 
ggaaagagat ttaacgaaaacacctga 

30 Sequence 1724 

VKDDLNDDFEDSLEYLEPLDHDAYIVRLNFTGENTTEPIISYMTTTHNIDVNILEADIKN 
TKNGS FGFLV I H I PH I SEEHFKQFKHNLHTKANLFSRYGWGKRFNENT* 

Sequence 1725 
35 Contig_064 5_pos_383_69, 

is similar to (with p-value 5.0e-54) 

>gp:gpl Y14370 I SAY14370_2 Staphylococcus aureus RF3, murE, yp 
fP genes. NID: g3256221. 

atggggagagtagcatgtcgagcagattatgttatttttactccagataatcctgctaac 
40 gatgatcctaaaatgttgacagctgaattagctaaaggtgcaacgcataacaattatata 
gagtttgatgaccgtgcagaaggtattagacacgcgat tgatattgctgaaccaggtgat 
acagttgttttggcctcaaaaggtcgagagccttatcaaattatgcctggtcatgttaaa 
gtcccacatcgcgatgacttaat tggcttaaaagcagcatatcaaaaatttggtggtgga 
cctcttgaggattaa 

45 

Sequence 1726 

MGRVACRADYVTFTPDN PANDDPKMLTAELAKGATHNNYIEFDDRAEGIRHAIDIAEPGD 
TVVLASKGREPYQIMPGHVKVPHRDDLIGLKAAYQKFGGGPLED* 

50 Sequence 1727 

ContigJD64 6_pos_54 24_407 8, 

is similar to (with p-value 0.0e+00) 

>sp:sp| P16618IHEM1JBACSU GLUTAMYL -TRN A REDUCTASE (EC 1.2.1.- 
) (GLUTR) . >pir:pir I A35252 I A35252 5-aminolevulinate synthase 
55 (EC 2.3.1.37) - Bacillus subtilis >gp: gp | M57676 I BACHEMAXC_1 
Bacillus subtilis hemAXCDBL gene cluster. NID: gl43034. >gp 
: gpj Z99118 1 BSUB0015_82 Bacillus subtilis complete genome (se 
ction 15 of 21): from 2795131 to 3013540. NID: g2635200. >gp 
: gp| Z75208 | BSZ75208_87 B. subtilis genomic sequence 89009bp. 
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NID: gl769994. 

atgcattttgttgcaattagcataaatcatcgaacagctgatgtaacattaagagagcaa 
gttgcttttagagatgatgccttacgattagcacatgaagatttatatgaaactaaagca 
attttagaaaatgtcattttatctacatgtaatcgtactgaagtatacgctattgttgat 
5 caagttcatacaggacgttattatatacaaagatttttagcgcgctcttttggatttgag 
gtagatgatattaaagatatgtcggaagttaaagtgggggacgatgcagttgaacattta 
ttgcgtgtcacttctggcttagattcaattgtgcttggtgaaacacaaattttaggacaa 
atgcgcgatgcatttttcttagcgcaaaatactggtacaactggaacgatttttaatcat 
ttatttaaacaagcgattacttttgctaaaaaagcacacagtgaaacagacattgcagat 

10 aatgctgtgagtgtttcttatgctgctgttgaattagctaaaaaggtatttggaaaatta 
aaaagtaaacatgctgtcgttattggagcaggggaaatgggtgaattatcactcttaaat 
cttttaggttctggaatttcaaatgtaacaattgttaatcggacattatctaaagctaaa 
attttagccgaaaaacacaatgtttcatatgattcactttcagcattaccatctttatta 
gaaacaacggacatagtaattagttctacaagtgctgaagactatatcatcactaattct 

15 atggtgaaaacaatttcagaaactagaaaactggattcattagttctgattgatattgcg 
gttccacgagacattgaaccagggattgatgcaattacaaatatttttaattatgatgtt 
gatgatttgaaagatttggtagatgccaatttaagagaacgt caatt a get get ga a act 
attgcaggacaaatacctgaggagattgattcacacaacgaatgggttaatatgcttggt 
gttgtacctgtaatcagagctttacgtgaaaaagctatgaatatccaagcagaaactatg 

20 gaaagtattgatcgtaaattgccagatctctctgaaagagaacgtaaagtcatttcgaaa 
catacaaaaagtattatcaatcaaatgttaaaagatcctatcaaacaggctaaggaatta 
agtactgataaaaaaagtaatgaaaaattagagctatttcaaaacatatttgatattgaa 
gccgaagatcctcgtgaaaaagcaaagttagaaaaagagagtagagcaaaggaaatctta 
gcgcatcgaatatttagttttgaataa 

25 

Sequence 1728 

MHFVAISINHRTADVTLREQVAFRDDALRLAHEDLYETKAILENVILSTCNRTEVYAIVD 
QVHTGRYYIQRFLARSFG FEVDDIKDMSEVKVGDDAVEHLLRVTSGLDSIVLGETQILGQ 
MRDAFFLAQNTGTTGT I FNHLFKQAIT FAKKAHSET DI ADNAVSVS YAAVELAKKV FGKL 
30 KSKHAVVIGAGEMGELSLLNLLGSGISNVTIVNRTLSKAKILAEKHNVSYDSLSALPSLL 
ETTDIVISSTSAEDYIITNSMVKTISETRKLDSLVLIDIAVPRDIEPGIDAITNIFNYDV 
DDLKDLVDANLRERQLAAETIAGQIPEEIDSHNEWVNMLGVVPVIRALREKAMNIQAETM 
ESIDRKLPDLSERERKVISKHTKSIINQMLKDPIKQAKELSTDKKSNEKLELFQNIFDIE 
AEDPREKAKLEKESRAKEILAHRIFSFE* 

35 

Sequence 1729 

Contig_064 6_pos_3867_324 4 , 

is similar to (with p-value l,0e-26) 

>sp:sp|P1664 5|HEMX_BACSU HEMX PROTEIN . >pir : pir | B35252 I B3525 

40 2 hypothetical protein (hemA 3' region) - Bacillus subtilis 
>gp:gp|M57 676|BACHEMAXC_2 Bacillus subtilis hemAXCDBL gene c 
luster. NID: gl43034. >gp: gp I Z99118 | BSUB0015_81 Bacillus sub 
tilis complete genome (section 15 of 21) : from 2795131 to 30 
13540. NID: g2635200. >gp: gp I Z75208 I BSZ75208_88 B . subtilis g 

45 enomic sequence 89009bp. NID: gl769994. 

gtgccactaggttctatatttgacgtttttttctctttaacttggattattatatcaatt 
tcgttgatattgaatttgattaaagtaatgaatttttctgtttttttccttaatttgatt 
ggatttatcctaatgagtttaaatacttttcagcctgaacattatcaaacgcaaattcaa 
caaattgcagtaattaatgaattattacttgttcatatagcacttgcagtattaagttat 

50 gcattttttgcaatcgcatttgtaaattcat tactctacat tat tcaatatcgaaattta 
aaggagaaaaatttcgatcaaaattactttagaattggtagtgttgctactttagaaacc 
atcgttttctattcaacacttgttgcatggattatccttatattaagtacgattttaggt 
gcacaatggggtatctttgcagtcggtaaacaaatatttatagatccaaaagtaatattt 
tcaacaattattaatttattatatggtgtttatattttcattagaataaaaaaatggata 

55 tcacaaagaaatcttatctactttaacattatattattttgtttgtgtatgattaattta 
ttctttttaactcattttagataa 

Sequence 1730 

VPLGSIFDVFFSLTWIIISISLILNLIKVMNFSVFFLNLIGFILMSLNTFQPEHYQTQIQ 
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QIAVINELLLVHIALAVLSYAFFAIAFVNSLLYIIQYRNLKEKNFDQNYFRIGSVATLET 
I VFYSTLVAWI ILI LSTI LGAQWGI FAVGKQI FI DPKVI FSTI INLLYGVYI FI RI KKWI 
SQRNLIYFNIILFCLCMINLFFLTHFR* 

5 Sequence 1731 

Cont ig_0 64 6_pos_3 190_22 64, 

is similar to (with p-value 0.0e+00) 

>gp: gp | U89396 | SAU89396_1 Staphylococcus aureus hemCDBL gene 
cluster: porphobilinogen deaminase (hemC), uroporphyrinogen 
10 III synthase (hemD), d-aminolevulinic acid dehydratase (hemB 
) and GSA-l-aminotransf erase {hemL) genes, complete cds . NID 
: g2589180. 

atgcgtaaattaattgttgggtcgcgaagaagtaaattagcgctaacacaaagtcaacaa 
tttatagataaattaaaatttatcgatccgtctttggatattgaaataaaagaaattgta 

15 actaaaggcgacaaaattgtagataaacaattatccaaagttggaggtaaaggacttttt 
gttaaggaaatccaaaatgaattatttaataaagagatagatatggcgattcattctcta 
aaagatgtaccaagtatgattcctgacggtcttaccttaggatgtattcctgatagagaa 
attccttttgatgcctatatagcaaaaaatcatataccattacaagaattgtctgagggc 
agcattgtaggtacaagttctttacgtcgtggcgctcaaattttatcaaaatacccacat 

20 ttaaaaattaagtggattcgtggaaacattgatactcgattaaaaaaattagagactgaa 
gattatgatgctattatattagctgctgctggattaaaacgcatgggttggtcagataat 
attgttacgacttatcttgatcgagatatattactgccagctatagggcagggtgcactt 
ggtattgagtgtaggagtgatgacaaagaacttttagatttactatctaaagtacacaat 
catgatgtagcacaatgtgtgactgctgaacgtacttttctatcagaaatggatggcagt 

25 tgtcaggttcctataggtggatatgcaacaattgctcaagataaccaaattgaatttaca 
ggactgattatgtctccagatggtaaggaaagatatgagcatacagcattgggtactgat 
cctgtaaaattgggtatagaagtgagtcaagtacttaaaaaacaaggtgcttatgacata 
attaaaaaattaaacgaagcagaataa 

30 Sequence 1732 

MRKLIVGSRRSKLALTQSQQFIDKLKFIDPSLDIEIKEIVTKGDKIVDKQLSKVGGKGLF 
VKEIQNELFNKEI DMAI HSLKDVPSMI PDGLTLGCI PDREI PFDAYI AKNHI PLQELSEG 
SIVGTSSLRRGAQILSKYPHLKIKWIRGNI DTRLKKLETEDYDAIILAAAGLKRMGWSDN 
IVTTYLDRDILLPAIGQGALGIECRSDDKELLDLLSKVHNHDVAQCVTAERTFLSEMUGS 

35 CQVPIGGYATIAQDNQIEFTGLIMSPDGKERYEHTALGTDPVKLGIEVSQVLKKQGAYDI 
IKKLNEAE* 

Sequence 1733 

Contig_064 6_pos_2230_1550, 

40 is similar to (with p-value 5.0e-76) 

>gp:gp|U89396|SAU89396_2 Staphylococcus aureus hemCDBL gene 
cluster: porphobilinogen deaminase (hemC), uroporphyrinogen 
III synthase (hemD), d-aminolevulinic acid dehydratase (hemB 
) and GSA-l-aminotransferase (hemL) genes, complete cds. NID 

45 : g2589180. 

atgaaaccagttatagttatgacgcagacgaatgaagttcatagtcatttagttgatatt 
atccataagccttttatccaactaaaacaacttcattttaatgaaaaattgcttgatcat 
agctacgactggcttattttttcgtctaaaaacgcagtaaaatacttttatccttattta 
aaaaacgttaaagttaaaaaggtagctgttataggtgataagacagctcagtattgtaat 

50 gaattaggtattagtgtcgactttgtgccacgtgatttttctcaagaaggctttttggac 
gagtttaagattagcgaacaacatttattgttgccctcaagtgaaaaagcacgtcctaaa 
ttagttcaacaattgagcaaatataatgaagtcgttaaaattgatttatatagaccagta 
ccgaattttaaaaatataagtcaagttaagtctcttgttagaaaacatcaaatagacgca 
gtgactttttctagttcctctgcagttgaattttatttcaaagaggacaatgtgcctgaa 

55 tttgatcattattttgctatcggtaagcaaactgctaggaccattttaaaattcaataca 
tctgtaaaagtggcaaataaacaaacattagattcacttattgataaaataatagaaagt 
agggaacaaaatgaaatttga 

Sequence 1734 
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MKPVIVMTQTNEVHSHLVDIIHKPFIQLKQLHFNEKLLDHSYDWLIFSSKNAVKYFYPYL 
KNVKVKKVAVIGDKTAQYCNELGISVDFVPRDFSQEGFLDEFKISEQHLLLPSSEKARPK 
LVQQLSKYNEVVKIDLYRPVPNFKNISQVKSLVRKHQIDAVTFSSSSAVEFYFKEDNVPE 
FDHYFAIGKQTARTILKFNTSVKVANKQTLDSLI DKI I ESREQNEI * 

5 

Sequence 1735 

Cont ig_0 64 6_pos_l 5 1 8_5 8 6 , 

is similar to (with p-value 0.0e+00) 

>sp:splP50915|HEM2_STAAU DELTA- AMINOLEVULINIC ACID DEHYDRATA 

10 SE (EC 4.2.1.24) (PORPHOBILINOGEN SYNTHASE) (ALAD) (ALADH) . 
>gp:gp|U89396| SAU89396_3 Staphylococcus aureus hemCDBL gene 
cluster: porphobilinogen deaminase (hemC), uroporphyrinogen 
III synthase (hemD), d-aminolevulinic acid dehydratase (hemB 
) and GSA-l-aminotransf erase (hemL) genes, complete cds . NID 

15 : g2589180. 

atgcgtgatttagtaagagaaactcatgttagaaaagaagatttaatatatccaatattt 
gtagttgagcaagatgatataaaaagtgaaattaaatcactaccaggcatataccaaat t 
agtttaaatttattgcatgaagagattaaagaggcatatgatttaggtattagagcaatc 
atgttcttcggtgtgccaaatgacaaagacgacattggatctggtgcatatgatcataat 

20 ggagttgttcaagaagcgacacgaatatctaagaatttatataaggatttacttattgtt 
gcagatacttgtctttgtgaatacacagaccacggacactgtggcgttattgacgatcat 
acgcatgatgtagacaatgataaatcacttccattacttgtaaaaacagctatttctcaa 
gttgaagctggagctgacatcattgctccaagtaatatgatggatggttttgttgctgaa 
attcgtgaaggccttgatcaagcgggatatcaaaatattcctatcatgagttatggtatt 

25 aaatatgcatcaagctttttcggtccattcagagatgctgcagattcagcaccttctttt 
ggggatagaaaaacctatcaaatggatcctgcaaaccgtttagaggcattaagagaattg 
gaaagtgatcttaaagaaggttgcgatatgatgatagttaaaccatctttaagttatcta 
gatattattagagatgtaaaaaataatacgaacgtgccagtcgtagcatacaacgttagt 
ggagaatatagtatgacaaaagcagcagcgttaaatggttggatagatgaagagaaaatt 

30 gttatggaacaaatgatatctatgaaacgtgcaggtgctgatttaataattacttatttt 
gcaaaagatatctgtcgttatttagataaatag 

Sequence 1736 

MRDLVRETHVRKEDLI Y PI FVVEQDDI KSEIKSLPGI YQISLNLLHEEIKEAYDLGIRAI 
35 MFFGVPNDKDDIGSGAYDHNGWQEATRISKNLYKDLLIVADTCLCEYTDHGHCGVIDDH 
THDVDNDKSLPLLVKTAISQVEAGADIIAPSNMMDGFVAEIREGLDQAGYQNI PIMSYGI 
KYASSFFGPFRDAADSAPSFGDRKTYQMDPANRLEALRELESDLKEGCDMMIVKPSLSYL 
DIIRDVKNNTNVPWAYNVSGEYSMTKAAALNGWIDEEKIVMEQMISMKRAGADLIITYF 
AKDICRYLDK* 

40 

Sequence 1737 

Cont ig__0 64 6_pos_0_535 , 

is similar to (with p-value 6.0e-93) 

>gp:gp|U89396|SAU89396_4 Staphylococcus aureus hemCDBL gene 
45 cluster: porphobilinogen deaminase (hemC), uroporphyrinogen 
III synthase (hemD), d-aminolevulinic acid dehydratase (hemB 
) and GSA-l-aminotransf erase (hemL) genes, complete cds. NID 
: g2589180. 

atggagcaagctgagaaattaatgcctggcggtgttaacagtcccgtaagagcatttaaa 
50 tcagtagacacaccagctatttttatggatcatggtgaaggatctaaaatatatgatatt 
gatggaaatgaatacattgattatgtgctaagttggggcccattaattctgggacataaa 
aatcaacaagttatatccaaattacatgaagcagtagataaaggtacaagcttcggcgct 
tcaacacttcaagaaaataaacttgctgaacttgtgattgaccgtgtaccttcaattgaa 
aaagtaagaatggtttcctcaggaactgaagctactttagacacacttcgtttagctagg 
55 ggttatacaggacgtaataaaattataaaatttgaagggtgttatcatggacacagtgat 
tctttattgattaaagcaggatcaggtgttgcaacactaggtttacctgattcaccaggc 
gtccctgaaggtattgctaaaaacactatcacggtgccatataatgatttaTGCC 

Sequence 1738 
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MEQAEKLMPGGVNSPVRAFKSVDTPAI FMDHGEGSKI YDIDGNEYIDYVLSWGPL1LGHK 
NQQVISKLHEAVDKGTSFGASTLQENKLAELVI DRVPSIEKVRMVSSGTEATLDTLRLAR 
GYTGRNKIIKFEGCYHGHSDSLLIKAGSGVATLGLPDSPGVPEGIAKNTITVPYNDLCX 

5 Sequence 1739 

Contig_064 8_pos_234_1580, 

is similar to {with p-value l.Oe-95) 

>sp:sp|Q4 54 93| YKQC_BACSU HYPOTHETICAL 61.5 KD PROTEIN IN ADE 
C-PDHA INTERGENIC REGION. >gp : gp I AF01 2285 I AF012285_29 Bacill 

10 us subtilis mobA-nprE gene region. NID: g3282109. >gp:gp|Z99 
111 | BSUB0008_125 Bacillus subtilis complete genome {section 
8 of 21): from 1394791 to 1603020. NID: g2633699. 
atgaaggcccgaaatattaaaaagaaagtacgttactatactgtaaaccatgattcaatt 
atgagatttaaaaatgttaacgtgagtttctttaatacgacacatagcattcctgatagc 

15 ttaggcgtatgtattcatacttcgtatggttctatagtttatactggagagtttaagttt 
gatcaaagtttgcatggacattatgctccagacttgaaacgaatggcagaaattggtgat 
gagggtgtgttcgcattaatcagtgattcaacagaagctgaaaagcctggatataacacg 
cctgaaaatattattgaacatcacatgtatgatgcttttgccaaggttaaaggtagactt 
attgtatcatgctatgcttcaaacttcgttcgtattcaacaagtgcttaacattgcaagt 

20 caacttaatcgtaaagtgtcatttttaggtcgttcacttgaaagttcgtttaacatagca 
cgtaaaatgggatactttgatataccaaaagatttattaatacctattaacgaagtggaa 
aattatcctaaaaatgaagtgattattattgctacaggtatgcaaggtgaaccagtagaa 
gcattaagtcaaatggctcgcaaaaagcataaaattatgaacatagaagaaggagattca 
atattcctagcaat tact get tcagctaatatggaggttattattgcagatacattaaat 

25 gagttagtgcgtgctggagcacatataattccaaacaacaagaaaattcatgcgtcaagt 
catggttgtatggaagaattgaaaatgatgttaaatattatgaaacctgaatattttgta 
cctgttcaaggtgaatttaaaatgcagattgcacatgccaaattagcagcagaaaccggt 
gtagcacctgagaaaattttcttagttgaaaaaggcgacgtgattagttataacggtaaa 
gatatgattttaaatgaaaaagttcaatcaggtaatatacttattgatgggattggcgtt 

30 ggtgacgtaggtaatatcgtattaagagacagacatctattagccgaagacggtattttt 
attgcggttgtgacattagatcctaaaaatcgacgtattgctgcaggacctgaaattcaa 
tcaagaggcttcgtctatgttagagaaagtgaagaacttttgaaagaggctgaagaaaaa 
gtacgtaaaattgtagaggaaggtcttcaagaaaaacgaatagaatggtcagaaatcaag 
caaaatatgagagatcaaatcagtaagttactatttgagagtacaaaacgccgtccaatg 

35 attattccagtcatatcggagatctaa 

Sequence 1740 

MKARNIKKKVRYYTVNHDSIMRFKNVNVSFFNTTHSIPDSLGVCIHTSYGSIVYTGEFKF 
DQSLHGHYAPDLKRMAEIGDEGVFALISDSTEAEKPGYNTPENIIEHHMYDAFAKVKGRL 

40 IVSCYASNFVRIQQVLNIASQLNRKVSFLGRSLESSFNIARECMGYFDIPKDLLIPINEVE 
NYPKNEVIIIATGMQGEPVEALSQMARKKHKIMNIEEGDSIFLAITASANMEVIIADTLN 
ELVRAGAHIIPNNKKIHASSHGCMEELKMMLNIMKPEYFVPVQGEFKMQIAHAKLAAETG 
VAPEKIFLVEKGDVISYNGKDMILNEKVQSGNILIDGIGVGDVGNIVLRDRHLLAEDGIF 
IAVVTLDPKNRRIAAGPEIQSRGFVYVRESEELLKEAEEKVRKIVEEGLQEKRIEWSEIK 

45 QNMRDQISKLLFESTKRRPMI I PVISEI * 

Sequence 1741 

Con t i g_0 6 4 8_po s_2 0 0 2_4 2 3 9 , 

is similar to (with p-value 0.0e+00) 

50 >sp:sp|P21458 |SP3E_BACSU STAGE III SPORULATION PROTEIN E. >p 
ir :pir|S09411 IS09411 spoIIIE protein - Bacillus subtilis >gp 
:gp| Z99112 [BSUB0009_150 Bacillus subtilis complete genome {s 
ection 9 of 21): from 1598421 to 1807200. NID: g2633902. 
atgattgatagcttttttaattatctttttggtatgagtcgatatttaacttatatttta 

55 gtacttattgcaacaatttttataacatactctaagcaaatacctagaactcgacgtagt 
atcggtgcaatagttttacaattagctttgttatttatagcgcaattgtattttcatttt 
tcacataatatcacttctcaaagagagcctgtactgtcctttgtttataaagcttatgaa 
caaacacattttccaaattttgggggaggcttaataggtttttatttacttaaactat tt 
atacctctcatatctattgtaggtgtaataataattactatcctattactagcttcgagt 
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ttcattttattacttaatttaagacatagagatgttacaaaaagtttattcgacaacctc 
aagtcatcaagtaatcatgcatctgagtcaataaaacaaaaaagagaacaaaataagatt 
aaaaaagaagaaaaagcccaattaaaagaggcaaaaattgaacgaaaaaaacaaaaaaaa 
tcacgtcagaataataatgtcattaaagatgttagtgattttccagagatttctcagtca 
5 gacgatattccaatatatggtcataatgagcaagaagataaaagaccaaatactgctaac 
caacgtcaaaaacgtgttttggataatgaacaatttcaacaatcattaccaagtaccaaa 
aatcaatcaataaataataatcagccatctacaaccgctgaaaacaatcaacaacaaagt 
caggctgaaggctcaatatctgaagctggtgaagaagccaatattgagtatacggtgcca 
cctttatccttattaaaacagcctactaaacaaaaaactacttcaaaagctgaagtccaa 

10 cgtaaaggtcaggttttagaatctacactaaaaaactttggagttaatgctaaagtaaca 
caaattaaaatcggtcctgcagttacgcaatatgaaattcaaccagcgcaaggtgttaaa 
gtaagtaaaatagtcaatctccataatgacattgcattagctttggctgcgaaagatgta 
cgaatagaagcacctattccaggtcgctctgcggtaggaattgaggttcccaatgataaa 
atctcacttgtcactctaaaagaagttttagaagataagttcccatctaagtataaatta 

15 gaagtcggcattggtagagatatttctggtgatccaatatcaattcaattaaatgaaatg 
cctcacttactcgttgctggttcaacaggaagcggtaaatcagtttgtattaatggtatt 
ataacgagtatattactcaacacaaaaccgcacgaagttaaacttatgttaatcgatcct 
aaaatggtagagttaaatgt ttacaatggtattcctcat ttacttataccggttgtaaca 
aacccacataaagcgtctcaagctttagaaaaaattgtttcagaaatggaacgtcgttat 

20 gatttgtttcaacattcatcgacacgaaatattgaaggatataaccaatatatacgcaaa 
cagaatgaagaacttgatgaaaaacaacctgagttaccgtatatcgtcgtaatagtggat 
gaattggctgatttaatgatggttgcaggtaaagaagtagaaaatgctatccaacgtatt 
actcaaatggctagagcagcgggtatacacttaattgtagctactcaaagaccttccgtt 
gatgttattactggtattattaaaaataacattccatcaagaattgcgttcgctgtaagt 

25 tctcaaactgactctagaacaataattggtgctggtggagctgaaaagctacttggtaaa 
ggtgatatgctatatgttggtaacggagaatctactacaacccgaattcaaggtgctttt 
ttaagtgatcaagaagtgcaagatgttgttaattatgttgtagagcaacagaaagcaaat 
tatgttaaagaaatggaaccagatgcacctgtagataaatcagaaatgaagagtgaggat 
gctttatatgatgaagcttatttatttgtaatagaaaagcaaaaagctagtacttcttta 

30 ttacaacgacaatttagaatcggttataatcgagcttcaaggctcatggatgatttggaa 
cgtaaccaagttattggtccacaaaaaggaagtaaacctagacaaatattagttgattta 
gaaaatgacgaggtgtaa 

Sequence 1742 

35 MIDSFFNYLFGMSRYLTYILVLIATI FITYSKQIPRTRRSIGAIVLQLALLFIAQLYFHF 
SHNITSQREPVLSFVYKAYEQTHFPNFGGGLIGFYLLKLFIPLISIVGVIIITILLLASS 
FILLLNLRHRDVTKSLFDNLKSSSNHASESIKQKREQNKIKKEEKAQLKEAKIERKKQKK 
SRQNNNVIKDVSDFPEISQSDDIPIYGHNEQEDKRPNTANQRQKRVLDNEQFQQSLPSTK 
NQSINNNQPSTTAENNQQQSQAEGSISEAGEEANIEYTVPPLSLLKQPTKQKTTSKAEVQ 

40 RKGQVLESTLKNFGVNAKVTQIKIGPAVTQYEIQPAQGVKVSKIVNLHNDIALALAAKDV 
RIEAPIPGRSAVGIEVPNDKISLVTLKEVLEDKFPSKYKLEVGIGRDISGDPISIQLNEM 
PHLLVAGSTGSGKSVCINGIITSILLNTKPHEVKLMLIDPKMVELNVYNGI PHLLIPWT 
NPHKASQALEKIVSEMERRYDLFQHSSTRNIEGYNQYIRKQNEELDEKQPELPYIVVIVD 
ELADLMMVAGKEVENAIQRITQMARAAGIHLIVATQRPSVDVITGI IKNNI PSRIAFAVS 

45 SQTDSRTIIGAGGAEKLLGKGDMLYVGNGESTTTRIQGAFLSDQEVQDVVNYVVEQQKAN 
YVKEMEPDAPVDKSEMKSEDALYDEAYLFVIEKQKASTSLLQRQFRIGYNRASRLMDDLE 
RNQVIGPQKGSKPRQILVDLENDEV* 

Sequence 1743 
50 Contig_0648_pos_4242_4955, 

putative peptide of unknown function 

atgtcggaaatgagtgcaatctatagagtaaaacaatacattttaaat ttaatcaaagat 
ggtgaactaaccaatggaagtaaattacctagtaatttgtcaattgcgagagcattaaat 
gttaaaacagatgatgtttatgatggtatagatgagtt gat tact gaacaagtagtaacg 
55 gataattttgaagaggggactagcgtaaaagtaaagccccctttctattacccgttaaat 
aaaattattagtatagggactatgattaaagaagcgggttatgaagcaggaacagaatat 
ctgaatcttgacgagcaacctgcaactattttagatgctgaacatttaggtatagaaaca 
aaagaacctataacaattattgagagactaaggactgctaatcataagcctgtcgtatat 
tgtttagacaaaatagcaaaaacttatctaacttgtacagattatcaacagagtagtggt 
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tcaatgttagaagctattaaagcatctacaaatcatcaaatcatgcatgcagaaatggat 
ttagaagcaattagttacgaaccccatatctctgaagtgcttaatgcttcacctcacgaa 
gggcttatgttacttaaagtagtacattatgacgaaaagcatcaaccaattttgtattct 
ttaaattatattaagagtagtttagttaaattcactattactaaaagtgaataa 

5 

Sequence 1744 

MSEMSAIYRVKQYILNLIKDGELTNGSKLPSNLSIARALNVKTDDVYDGIDELITEQVVT 
DNFEEGTSVKVKPPFYYPLNKIISIGTMIKEAGYEAGTEYLNLDEQPATILDAEHLGIET 
KEPITIIERLRTANHKPVVYCLDKIAKTYLTCTDYQQSSGSMLEAIKASTNHQIMHAEMD 
10 LEAISYEPHISEVLNASPHEGLMLLKVVHYDEKHQPILYSLNYIKSSLVKFTITKSE* 

Sequence 1745 
Contig_064 8_pos_54 21_6257, 
is similar to {with p-value 2.0e-28) 
15 >gp:gp| AF082738 I AF082738_1 Streptococcus pyogenes phosphotid 
ylglycerophosphate synthase (pgsA) and ABC transporter ATP-b 
inding protein (stpA) genes, complete cds; and unknown genes 
. NID: g3426363. 

atggaagataataaagcacagtattcatttcttcagttaatgaattatatgtttaaacaa 
20 gaaccctatcgatatatagcgacaggtcaattagaacaaattccacaagtgacttctgaa 
agtctatacgatacatatctatccatggtacaaaatgatgattgtgccatatatgttgta 
ggaaatattaacaaagaggaagtaacgcaactaattctagataagtttgcaattaagcct 
ttctatttagaaaataaagaaagtactgaaatcacaccttcttttgatcaaccgcaatat 
ataattgaaaaagacgatgttgaccaagctaaattgaatttgggatatcgctttccatct 
25 tattatgggaaaagtaattactatgcatttatagtattaaatatgatgtttggaggagat 
ccttcctcagtactatttaatgaagtcagagaaaagcaaagtttggcatactctatacat 
tcacaaattgatggtaaaaacggatttttatttgttttaagtggtgtttctgctgagaaa 
tatgagcaagcaaaagatactgtcatcaaagagtttgataagataaaaaatggagatttt 
gattctaataaaattgaattagctaaaaaaatcattatttcccatagacacgaagcatca 
30 gatagacctaaaagtataattgaaatactacataatcaattattattaaaccgacagcaa 
actgatcaagattttataaatgcagt taatcaagtgacgaaaaaagatgttattaaattg 
gcaaatgaagctgttctagatacaatttatgtactaacgaaaggagaccaacactga 

Sequence 1746 

35 MEDNE<AQYSFLQLMNYMFKQEPYRYIATGQLEQIPQVTSESLYDTYLSMVQNDDCAI YVV 
GNINKEEVTQLILDKFAIKPFYLENKESTEITPSFDQPQYIIEKDDVDQAKLNLGYRFPS 
YYGKSNYYAFIVLNMMFGGDPSSVLFNEVREKQSLAYSIHSQIDGKNGFLFVLSGVSAEK 
YEQAKDTVIKEFDKIKNGDFDSNKIELAKKI IISHRHEASDRPKSI IEILHNQLLLNRQQ 
TDQDFINAVNQVTKKDVIKLANEAVLDTIYVLTKGDQH* 

40 

Sequence 1747 

Contig_064 8_pos_668 9_7012, 

putative peptide of unknown function 

atgtaccaggaacaaccaggatataaattaatgtttaatactttaagggctatgtattcc 
45 aagcacccgatacgggtggatatcgctggtagtgttgaaagcatttatgaaataacaaaa 
gatgatttatatctatgctatgagacattttatcatccctctaatatggtgttgtttgtg 
gtaggcgatgttagtcctcaatcgataattaaacttgtagaaaagcatgaaaatcaaaga 
aataaaacttatcaaccacgtattgaacgtgcgcaaattgatgagcctagagagataaat 
cacggtttgtttctgagaaaatga 

50 

Sequence 1748 

MYQEQPGYKLMFNTLRAMYSKHPIRVDIAGSVESI YEITKDDLYLCYETFYHPSNMVLFV 
VGDVSPQSIIKLVEKHENQRNKTYQPRIERAQIDEPREINHGLFLRK* 

55 Sequence 1749 

Cont i g_0 6 4 8_pos_7 0 3 3_0 , 

putative peptide of unknown function 

atgctaggttttaaaaatgaaccattagatgaaagtgcaactaaatttgttcaaagagat 
ttggaaatgacatttt tctacgaattggtttttggagaggaaacggagt tttatcaacaa 
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cttttaaataaagatttaatagatgaaacattcggttatcaatttgtattggaaccgagc 
tacagtttttcaattattactagtgcaacacaacagcctgatctatttaaacaattaata 
atggatgaattaagaaaatataaaggaaaccttaaagatcaagaagcatttgatttgttg 
aaaaagcaatttattggagaattcatatcaagtttaaattctccagaatatattgctaat 
5 caatatgcaaaactctatttcgagggagtgagtgtatttgatatgcttgatatcgtagaa 
aatattacgttagagagtgtaaatgaaacttccgaattattcttgaactttgaccaactt 
gttgatagtcgtttggagatggaaaataga 

Sequence 1750 

10 MLGFKNEPLDESATKFVQRDLEMTFFYELVFGEETEFYQQLLNKDLIDETFGYQFVLEPS 
YSFSIITSATQQPDLFKQLIMDELRKYKGNLKDQEAFDLLKKQFIGEFISSLNSPEYIAN 
QYAKLYFEGVSVFDMLDIVENITLESVNETSELFLNFDQLVDSRLEMENR 

Sequence 1751 
1 5 Con t i g_0 64 9_pos_2 1 1_58 8 , 

putative peptide of unknown function 

atgaatattttgatgtttataaaaatgaatgatattgccataaacggacttatgttgctc 
atttctcacgcaattatgatattagaagctatctatttttatcctcgttttaaaatatct 
aaattggctggattgatgagtttcatatgggtgacgatcaatgacgtaatagattacata 
20 tacggacaatatccctactatgattttatcgccaaacatttaattgaagtaggggtattg 
gcttatagtctcactatcatttcgtatattttatttttaaaattacaaaagtggttgaaa 
gttaaaacatttgattaa 

Sequence 1752 

25 MNILMFIKMNDIAINGLMLLISHAIMILEAIYFYPRFKISKLAGLMSFIWVTINDVIDYI 
YGQYPYYDFIAKHLIEVGVLAYSLTIISYILFLKLQKWLKVKTFD* 

Sequence 1753 

Con t i g_0 6 4 9_pos_l 92 0_2 990, 

30 is similar to (with p-value 0.0e+00) 

>gp: gpl L38 4 24 | BACJOJC_6 Bacillus subtilis dihydropicolinate 
reductase (jojE) gene, complete cds; poly (A) polymerase (joj 
I) gene, complete cds; biotin acetyl-CoA-carboxylase ligase 
(birA) gene, complete cds; jojC, jojD, jojF, jojG, jojH gene 

35 s, complete cds 1 s . NID: g755600. 

atggcagaacgtggccatgaagttcactttattacctcaaacataccctttagaatacgc 
aaacctttacctaacatgacgttccaccaagttgaagtcaatcaatatgccgtattccaa 
tatccaccatacgatat taca ttaagtacaaaaatttctgacgttatacaagaatatgat 
ttagacatattgcatatgcattatgctgtacctcacgctgtatgtggtatattagcgaaa 

40 caaatgtcaggtaaaaacgtcaaaattatgacaacactacatggcactgatataactgtg 
ttaggttatgaccatactttacaaaacgcgataaaatttggcatagaacaaagtgatatt 
gtaacaagtgttagccattctctagcacagcaaacttatgaaattatcaatactaaaaag 
gaaa teat cccta tat ataattttataagggaaaatgaattcccaactcggcataatgaa 
gaattaaaagattgttatggtatttcacctgaagaaaaggtattgatacatgtttctaat 

45 ttcagaaaagtaaaacgtattgatacagtgattgagacatttgcaaaagttcatgagagt 
ataccatccaagttgatacttttaggagatggtccagaattaatcgatatgcgacataaa 
gcacgagaacttgatgtt gaaa cacacgt act ctttttaggcaaacaaaatgacgtaagc 
gcattctaccaactatctgatttagtactactcttaagtgagaaagaaagttttggatta 
actctcttagaagcaatgaaaacaggcgtcttacctatagggagtcatgcaggtggtatt 

50 aaagaggtcatcagacatgaagaaactggatttatagtagatataggggatagtacacaa 
gctgcaaaatatgctattaaacttttatcaaatccagagttatatcaaaaaatgcaatca 
caaatgctgaaagatattgaagcaagatttagttcagatttaattactgaccaatatgaa 
aactattatcgaaagatgctagaacaaggtgagaacaacaatgagtcatga 

55 Sequence 1754 

MAERGHEVHFITSNIPFRIRKPLPNMTFHQVEVNQYAVFQYPPYDITLSTKISDVIQEYD 
LDILHMHYAVPHAVCGILAKQMSGKNVKIMTTLHGTDITVLGYDHTLQNAIKFGIEQSDI 
VTSVSHSLAQQTYEIINTKKEI IPI YNFIRENEFPTRHNEELKDCYGISPEEKVLIHVSN 
FRKVKRIDTVIETFAKVHESIPSKLILLGDGPELIDMRHKARELDVETHVLFLGKQNDVS 
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AFYQLSDLVLLLSEKESFGLTLLEAMKTGVLPIGSHAGGIKEVTRHEETGFI VDIGDSTQ 
AAKYAIKLLSNPELYQKMQSQMLKDIEARFSSDLITDQYENYYRKMLEQGENNNES+ 

Sequence 1755 
5 Contig_064 9_pos_3166_4182, 

is similar to (with p-value 1.0e-51) 

>sp:sp|P4 2977|PAPS_BACSU POLY{A) POLYMERASE (EC 2.7.7.19) (P 
AP) . >gp:gp|L47709|BACYPIA_15 Bacillus subtilis (clone YAC15 
-6B) ypiABF genes, qcrABC genes, yp j ABCDEFGHI genes, birA ge 
10 ne, panBCD genes, dinG gene, ypraB gene, aspB gene, asnS gene 
, dnaD gene, nth gene and ypoC gene, complete cds ' s . NID: gl 
146223. >gp:gp| Z991 15 | BSUB0012JL85 Bacillus subtilis complet 
e genome (section 12 of 21): from 2195541 to 2409220. NID: g 
2634478. 

15 gtggggaaagaacacggtacgatcaacgtggtctttcaaaatgacaattatgaaattact 
acattcagatctgaagatgaatacatcgatcatcgtaggccaagtgaagtgtattttgta 
agagacttatatcaagatgttcaacgtagagattttacaatgaatgctatagcaatggat 
ttaaattaccggttgtatgattattttaatggtcaacaagatataaacaatcgagtaatt 
cgtactgttggtgtaccaagtgaaaggttttcagaagacgcgcttcgtatcattagagga 

20 ttacgttttcaatcacaacttaattttcaaattgattcagacacattacatgcaatgtct 
tctcagatttcagatatacaatatttatccgttgaacgtgtagtagtagagcttaaaaaa 
cttatcatgggaaacaatgttaaacaaagttttgaagtcatgcaaaacatgaaagcattt 
aattatataccttttttcaaat cat tt gaga tgtctcatcttcatatagatgagcccatc 
acatttgaactttggattgcaatcttaatcgtccaacaaccaaaagatatacaattaagc 

25 accttgaaaatcagcaatcaagaaaaagcaactatcaaaaaatgggttacactcatccaa 
acattgcctaagatacagtcaaagcaatctttaataacattagtatatgattacaattta 
aatgatattgaaatcctattatcattacatcatttgcttaaacaaaatgggcttacaaca 
gccaatcat ttaatcat taatgaaataagtat tcgcgaagcaaatgaaaaa ttacctatc 
cattgcagaaaagaattggcaataaatggtaaagatatactcaatcatacgaataaaaat 

30 tcaggaccatggctaaaagatacacttagagaaatagaaatcgcagtcatatcaaatcaa 
atagtcaacactaaagaagaaatattagaatgggtggatgcacatgtcaaaatatag 

Sequence 1756 

VGKEHGTINVVFQNDNYEITTFRSEDEYIDHRRPSEVYFVRDLYQDVQRRDFTMNAIAMD 
35 LNYRLYDYFNGQQDINNRVIRTVGVPSERFSEDALRI IRGLRFQSQLNFQI DSDTLHAMS 
SQISDIQYLSVERVWELKKLIMGNNVKQSFEVMQNMKAFNYIPFFKSFEMSHLHIDEPI 
TFELWIAILIVQQPKDIQLSTLKISNQEKATIKKWVTLIQTLPKIQSKQSLITLVYDYNL 
NDIEILLSLHHLLKQNGLTTANHLIINEISIREANEKLPIHCRKELAINGKDILNHTNKN 
SGPWLKDTLREIEIAVISNQIVNTKEEILEWVDAHVKI* 

40 

Sequence 1757 

Cont ig J3 64 9_pos_4 1 99_5 14 0, 

is similar to (with p-value 2.0e-47) 

>sp:sp|P4 2975|BIRA_BACSU BIRA BI FUNCTIONAL PROTEIN (BIOTIN 0 
45 PERON REPRESSOR) (BIOTIN — [ACETYL- COA-CARBOXYLASE) SYNTHETA 
SE) (EC 6.3.4.15) (BIOTIN— PROTEIN LIGASE) , >gp : gp | L47709 | BA 
CYPIA_16 Bacillus subtilis (clone YAC15-6B) ypiABF genes, qc 
rABC genes, ypj ABCDEFGHI genes, birA gene, panBCD genes, din 
G gene, ypmB gene, aspB gene, asnS gene, dnaD gene, nth gene 
50 and ypoC gene, complete cds's. NID: gll46223. >gp:gp| Z99115 
1 BSUB0012_184 Bacillus subtilis complete genome (section 12 
of 21): from 2195541 to 2409220. NID: g2634478.. 
atgttgtatgaacatcaatcaaattacatatcagggcaatatattgccgatcaactcaat 
atttctagagcaggtgtcaaaaaagttattgacctattaaaagaagatggttgtgatatc 
55 aagtcaataaaccacaaaggccatcaactgaattcattacctgatcagtggtatagcggt 
attgtaaaacctattctcgatgaacttggcctttttaatcatctagaagtttatcacact 
gtagattcaacacaattaaaagcaaagagagcactcgttggaaataaagatactttttta 
attttgagcgatgaacaaaccgaaggtagaggtagattcaatcgtaattgggaatcatct 
aaaggaaaaggct tatggatgtcactagtgctaagacctgacgtacctttttctatgata 
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cctaaatttaatttatttattgctttaggtattagagatgctattcaacaattttcgaac 
gaacgtgtaacaattaaatggccaaatgatatatatattggtaataaaaaaatttgcgga 
tttttaactgaaatggttgcaaattatgatgaaatagaagcaataatttgtggtataggt 
ataaatatgaatcatgttgaaagtgattttgacgaggatattaaagataaagcaacaagt 
5 atacgcatgcattccgatagtataattaatagatatacttttttaactgcattattaact 
caaattatacatcgctttgatcaatttttacatcaaacttttgagtcaattcgagaagaa 
tatattcacgctacaaatatatggcatcgtcaacttaaattcactgaaaataatcatcaa 
tttttgggggaagccatagatattgattcagatggattccttattgttaaagatgaaaaa 
ggtcaattacatcgacttatgagtgcagatatagatttataa 

10 

Sequence 1758 

MLYEHQSNYISGQYIADQLNISRAGVKKVIDLLKEDGCDIKSINHKGHQLNSLPDQWYSG 
IVKPILDELGLFNHLEVYHTVDSTQLKAKRALVGNKDTFLILSDEQTEGRGRFNRNWESS 
KGKGLWMSLVLRPDVPFSMIPKFNLFIALGIRDAIQQFSNERVTIKWPNDIYIGNKKICG 
15 FLTEMVANYDEIEAIICGIGINMNHVESDFDEDIKDKATSIRMHSDSIINRYTFLTALLT 
QIIHRFDQFLHQTFESIREEYXHATNIWHRQLKFTENNHQFLGEAIDIDSDGFLIVKDEK 
GQLHRLMSADIDL* 

Sequence 1759 
20 Cont ig J3 64 9_pos_52 91_0 , 

is similar to (with p-value 3.0e-66) 

>sp:sp|P54 394 |DING_BACSU PROBABLE ATP- DEPENDENT HELICASE DIN 
G HOMOLOG. >gp:gp|L4 7709[BACYPIA_20 Bacillus subtilis {clone 
YAC15-6B) ypiABF genes, qcrABC genes, ypjABCDEFGHI genes, b 
25 irA gene, panBCD genes, dinG gene, ypraB gene, aspB gene, asn 
S gene, dnaD gene, nth gene and ypoC gene, complete cds's. N 
ID: gll46223. >gp : gp | Z99115 j BSUB0012_180 Bacillus subtilis c 
omplete genome (section 12 of 21): from 2195541 to 2409220. 
NID: g2634478. 

30 atgatacgtaccgatttagaaattccaccgtttatccaagccttaacttctatagaagaa 
gaaatgttagtacaggcaccttattttaatgaggtcgcagatgacatttatcaacttatc 
aaagactgtgtttttgttgcacataatatctcatttgatcttaattttataaaaaaagct 
tttgaaaaatgtaatattcaatttaaacctaaaagagtaatggatactttagaattgttt 
aaaatcgcattccctacagacaaaagctaccagttaagtgcattggctgaatctcatcat 

35 ataccattaaataatgcacatagagcagatgaagatgcaacaacaactgctaaattgatg 
attaaagcttttgagaagttcgagcaattgcatttagatacacaaaaacaactgtactat 
ttaagtaaaaatctcaagtatgatctttatcatattttgtttgaaatggttagaaattat 
caaactaaaccgcctaacaaccaatttgaacaatttgaacaaattatttaccgtaaacaa 
attgatttaaaaaaaccggctgtcaattttgatggtaccttaaaagatttatataaaaat 

40 gtcactcaatcattaaatcttacatatcggccacaacagttatacctagccgaaattata 
cttgatcagttgatgcatagtgataaggcaatgattgaagctcctttgggtagtggaaag 
tctcttgcttacctgcttgcagcaacaatgtataatattgagaccggtcgtcatgtaatg 
atttcaacaaatacaaaattattacaaagtcagctattagagaaagacataccattactc 
aatgatgttttagattttaaaattaacgcgtcattaatcaaaagtaaaaatgattatata 

45 tctcttggtcttatcagccaaattcttaaagacgatacaaataattatgaagtaagtatt 
cttaaaatgcagttacttatttggataactgaaacaaatactggggacatacaggaatta 
aatcttaaaggtggacaaaaaatgtatgttgaccaaaaaattgaaacatacgttccagtt 
cgtcatgatatccattattataattatataaaaagaaatgctcaaaacatacaaattggt 
atcactaatcatgcgcacttaattcattcagacagtgaaaacactatatatcaactattt 

50 gatgattgcatcatcgatgaagcacatagattgcctgactatgcgctaaatcaagttact 
aatgatttaaattattcagatgttaaatatcaattaggacttattggcaaaaatgaaaat 
gaaaaactacttaaagcagtagacaaacttgagcaacaacgtattttagagaaactagat 
attgcacctatagacgtttttggactgaaaataaatatcaatgagttacatgatttaaat 
gagcaactattcactacaatttataatattattcaaacatcagacgtttatgatgatgac 

55 attcataagtatcattacgtttatgactttgaaacgggtgagattttaaaagatttacgt 
gcaatcatagataaattaaataaaacgatagaaatttttaacggaatgaatcacaaaaca 
atcaagtctgtacggaaacaattattatacttacatgacaaatttaaacttata 

Sequence 1760 
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MIRTDLEI PPFIQALTS I EEEMLVQAPYFNEVADDI YQLI KDCVFVAHNI S FDLNFI KKA 
FEKCNIQFKPKRVMDTLELFKIAFPTDKSYQLSALAESHHIPLNNAHRADEDATTTAKLM 
IKAFEKFEQLHLDTQKQLYYLSKNLKYDLYHILFEMVRNYQTKPPNNQFEQFEQIIYRKQ 
IDLKKPAVNFDGTLKDLYKNVTQSLNLTYRPQQLYLAEIILDQLMHSDKAMIEAPLGSGK 
5 SLAYLLAATMYNIETGRHVMISTNTKLLQSQLLEKDIPLLNDVLDFKINASLIKSKNDYI 
SLGLISQILKDDTNNYEVSILKMQLLIWITETNTGDIQELNLKGGQKMYVDQKIETYVPV 
RHDIHYYNYIKRNAQNIQIGITNHAHLIHSDSENTIYQLFDDCIIDEAHRLPDYALNQVT 
NDLNYSDVKYQLGLIGKNENEKLLKAVDKLEQQRILEKLDIAPIDVFGLKININELHDLN 
EQLFTTIYNIIQTSDVYDDDIHKYHYVYDFETGEILKDLRAIIDKLNKTIEIFNGMNHKT 
1 0 I KS VRKQLLYLH DKFKLI 

Sequence 1761 

Contig_0650_pos_3702_4 013, 

putative peptide of unknown function 

15 gtgtataatggtataaatgttatgtatataattatatggaggcattatatgaagagtatg 
aagcaaatcgctgatgaattaaacgtaacaaagatgactgtttataataatgctaagaaa 
gcaaatgtgaaatttcaaaaaattgaaaatgtaaattatttatcttccgaagatgaagtt 
atagtagctaatagaataaaaaaaaatcaaaataaaactgattacttcgataatgaaaaa 
aaggtagaaaccaaacccaacaatgataatcttgtaaaaatgaaacgattaaacatttat 

20 ataaccaattag 

Sequence 1762 

VYNGINVMYIIIWRHYMKSMKQIADELNVTKMTVYNNAKKANVKFQKIENVNYLSSEDEV 
IVANRIKKNQNKTDYFDNEKKVETKPNNDNLVKMKRLNI YITN* 

25 

Sequence 1763 

Contig_0650_pos_8636_9154, 

is similar to (with p-value 4.0e-82) 

>gp:gpl U50077 |SAU50077_1 Staphylococcus aureus multidrug res 
30 istance plasmid pKH8 replication protein (rep) gene, qacC* g 
ene, and multidrug resistance protein (qacC) gene, complete 
cds. NID: gl236637. 

atgacgaaaagtgggaaacaacgcccatggagagaaaagaagatagataatgtaagttat 
gcagatatactggaaattttaaaaataaaaaaggcttttaatgtaaaacaatgtggtaac 

35 gtcttagagttcaagccgactgatgaaggttatttgaagttacataagacatggttttgt 
aagtcgaaactctgcccagtttgtaattggaggcgtgctatgaaaaatagttatcaagct 
caaaaagtgattgaagaagttgttaaagaaaaaccaaaagcgcgttggttatttttaaca 
ctttcaacgaaaaatgcgatagatggggatactttagaacaaagtttgaaacatttaacg 
aaagcatttgataggttaagtagatataaaaaagtgaagcaaaatcttgttgggtttttg 

40 cgttcaacggaagtaacagttaataaaaatgatggtagttataatcaacaattcggaacc 
tacacagtcgcaaaaaagtatgatagaagaattgattaa 

Sequence 1764 

MTKSGKQRPWREKKI DNVSYADI LEI LKIKKAFNVKQCGNVLEFKPTDEGYLKLHKTWFC 
45 KSKLCPVCNWRRAMKNSYQAQKVIEEVVKEKPKARWLFLTLSTKNAIDGDTLEQSLKHLT 
KAFDRLSRYKKVKQNLVGFLRSTEVTVNKNDGSYNQQFGTYTVAKKYDRRID* 

Sequence 1765 

Contig_0650_pos_9204_9533, 

50 putative peptide of unknown function 

atgttaaaagaagatatgaagttgccaaaatcttatatttttgaaattgcttccaactgg 
aagaaaattggtatttcaaatgccaaacaagcatatgaatatgcattacaagttaatcaa 
cctaaaaattacgaaacacattctaatgataaacgacagaacaatcgtggaagacaaaat 
caatttttatccaaagaaaagacacctaaatggcttcaaaatagggacgatcaagaagaa 

55 aataaagaaataaatgatgacactctcgaagaagatcgacaagcatttcttgaaaagtta 
aa tcaaaagtggaaggaggaagataactaa 

Sequence 17 66 

MLKEDMKLPKSYIFEIASNWKKIGISNAKQAYEYALQVNQPKNYETHSNDKRQNNRGRQN 
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QFLSKEKTPKWLQNRDDQEENKEINDDTLEEDRQAFLEKLNQECWKEEDN* 
Sequence 17 67 

Contig_0650j?os_9554_10453, 
5 is similar to (with p-value 6.0e-63) 

>sp:sp|P06567|DNAI_BACSU PRIMOSOMAL PROTEIN DNAI . >pir:pir|B 
24720 UQBS44 dnaA protein homolog, 44K - Bacillus subtilis > 
gp:gp| AF008220 |AF008220_192 Bacillus subtilis rrnB-dnaB geno 
mic region. NID: g2293135. >gp:gp|X04 963|BSDNAB_l Bacillus s 

10 ubtilis dnaB gene for initiation of chromosomal replication. 
NID: g39880. >gp : gp I Z99118 I BSUB0015_163 Bacillus subtilis c 
omplete genome (section 15 of 21): from 2795131 to 3013540. 
NID: g2635200. >gp : gp I Z75208 | BSZ75208_2 B . subtilis genomic s 
equence 89009bp. NID: gl769994. 

15 atgggcgattctcaaaatctagataaacgtatacaaaaaataaaacaaaatgtaatcaat 
gatactgacgttaaacattttcttgagaaaaatcgtagtaatataactaatgagatgata 
gacgaagatttaaatgttcttcaagagtataaagatcaacaaaaagtttatgatggacat 
cgctatgatgattgtccgaattttgtaaaaggacatgttcctgaactatatattgaaaat 
gaaagaatcaaaattagatatctaccttgcccgtgtaaaattaaacatgatgaggaacga 

20 tttgattcacaacttattacatctcaccatatgcaaagagatacacttcatgcaaagctc 
aaagatatttatatgaataatcgagagagacttgatgtagcaatggcagctgatcaaatc 
tgtacagcaattactaacgatgaaaaagtaaaggggttatatttatatggtccttttggt 
acaggaaaatcattcatattgggtgctattgcaaatcaacttaaatcgcaaaagatt tea 
tcaacaattgtatatttaccagaatttattcgcactttaaaaggtggctttaaagacggt 

25 agttttgagaaaaaattacaacgtgtgcgagaagctaatattttgatgttagatgatatt 
ggcgcagaagaagtcacaccgtgggtaagagatgaagtgattggtcctttattacattat 
agaatggtacatgaacttcctacattttttagttctaactttaattatagtgagcttgag 
catcatctttcaataactagagatggcactgaaaagactaaagcagcacgaattattgaa 
agaattaagactttatcgacaccttattatttgactggtaaaaattttagaaacaattga 

30 

Sequence 1768 

MGDSQNLDKRIQKIKQNVINDTDVKHFLEKNRSNITNEMIDEDLNVLQEYKDQQKVYDGH 
RYDDCPNFVKGHVPELYIENERIKIRYLPCPCKIKHDEERFDSQLITSHHMQRDTLHAKL 
35 KDIYMNNRERLDVAiXIAADQICTAITNDEKVKGLYLYGPFGTGKSFILGAIANQLKSQKIS 
STIVYLPEFIRTLKGGFKDGSFEKKLQRVREANILMLDDIGAEEVTPWVRDEVIGPLLHY 
RMVHELPTFFSSNFNYSELEHHLSITRDGTEKTKAARIIERIKTLSTPYYLTGKNFRNN* 



40 Sequence 17 69 

Contig_0650_pos_10806_12743, 

is similar to (with p-value 0.0e+00) 

>sp:sp| P18255 |SYT1_BACSU THREONYL-TRNA SYNTHETASE 1 (EC 6.1. 
1.3) (THREONINE— TRNA LI GAS E) (THRRS) . >pir : pir j B3777 0 | YSBST 

45 1 threonine — t RNA ligase (EC 6.1.1.3) 1 - Bacillus subtilis 
>gp:gp|AF008220|AF008220_195 Bacillus subtilis rrnB-dnaB gen 
omic region. NID: g2293135. >gp:gp | M36594 | BACTRNASB_1 B.subt 
ilis threonyl-tRNA synthetase (thrSv) gene, complete cds . NI 
D: gl43765. >gp : gp I Z99118 I BSUB0015_160 Bacillus subtilis com 

50 plete genome (section 15 of 21): from 2795131 to 3013540. NI 
D: g2635200. >gp : gp I Zl 5208 I BSZ75208_5 B. subtilis genomic seq 
uence 89009bp. NID: gl769994. 

atgaatcaaattaatattcaatttccagatggtaatacaaaagaatttgataaagggact 
actacagaagacatcgctcaatcaattagtccaggattaagaaaaaaagcagttgcggga 
55 aaattcaatggtcaact tgtagatttaacacgccctttagaacaagatggagctattgaa 
attattactcctgggagtgaagaagcgttagaagtacttcgtcattcaacagctcattta 
atggcacaagcat taaaacgtttatacggagacgttaaatttggagttggacctgtaata 
gaaggcggattctattatgattttgatatggatgataaggtttcatcggatgattttgat 
aaaattgagaaaacaatgaaacaaattgtgaacgaaaatcataaaattgtaagagaagta 
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gttagtaaagaaaaagcaaaagacttcttcaaggatgacccttataaattagaacttatt 
gatgcaattcctgaagatgagagtgtaacactttatactcaaggtgaatttactgattta 
tgtcgaggtgtacacgtaccttctacttctaaaattaaagagttcaaactattatctaca 
gctggtgcttattggcgtggaaatagtgataataaaatgttacaacgaatttatggtaca 
5 gcattctttgacaaaaaagatttgaaagcacatctaaaaatgttggaagaacgtcgtgag 
cgtgatcatcgtaaaattggtaaagatttagaattgtttacaaacaatcaactcgttggt 
gctggtttaccattatggttaccaaatggtgctacaatacgtagggaaatagaacgttat 
attgtcgataaagaagtaagtatgggatacgatcatgtttacacaccagtattagccaat 
gttgatttatataaaacatctggtcactgggatcattatcaagaagatatgttcccagca 

10 atgaagttagatgaagacgaagcaatggtcttaagaccaatgaactgtccacatcatatg 
atgatttataaaaacaaacctcattcttatcgcgaattacctatacgtattgctgaattg 
ggtactatgcatcgttacgaagcaagtggtgcagtatcaggtttacaacgtgttcgagga 
atgacattgaatgattcccatattttcgttagacctgatcaaattaaagaagaatttaaa 
cgtgtagttaatatgattcaagatgtgtacaaagattttggttttgaagattatcgcttc 

15 agattgagttatagagatcctgaagataagcataagtactttgatgatgatgaaatgtgg 
gaaaaagctgaatccatgcttaaagaagcatcagatgaattaggtttaacttatgaagaa 
gctattggtgaggcagcattctatggacctaagttagatgttcaagtaaaaacagctatg 
ggaaaagaagaaactctatcaacagcacaacttgattttcttttaccagaacgttttgac 
ttaacgtacattggtcaagatggagaacaacatcgtcctgtagttatacaccgtggtgta 

20 gtttctactatggaacgttttgttgcatttttaacagaagaaacaaaaggtgcatttcca 
acttggttggcgcctatgcaagttgaaattattcctgtaaatatagatttacattatgat 
tatgcaagacttttacaagatgaactaaaatcgcaaggtgtccgcgttgaaattgatgac 
cgtaatgaaaaaatgggatataaaattcgtgaagctcaaatgaaaaaaataccttatcag 
attgttgtaggtgaccaagaagtagagaatcaagaagtaaatgtaagaaaatatggttct 

25 gaaaaacaagaatcagttgaaaaagatgaatttatttggaatgttattgatgaaatccgt 
ttgaaaaagcatagataa 

Sequence 1770 

MNQINIQFPDGNTKEFDKGTTTEDIAQSISPGLRKKAVAGKFNGQLVDLTRPLEQDGAIE 
30 IITPGSEEALEVLRHSTAHLMAQALKRLYGDVKFGVGPVIEGGFYYDFDMDDKVSSDDFD 
KIEKTMKQIVNENHKIVREVVSKEKAKDFFKDDPYKLELIDAIPEDESVTLYTQGEFTDL 
CRGVHVPSTSKIKEFKLLSTAGAYWRGNSDNKMLQRI YGTAFFDKKDLKAHLKMLEERRE 
RDHRKIGKDLELFTNNQLVGAGLPLWLPNGATIRREIERYIVDKEVSMGYDHVYTPVLAN 
VDLYKTSGHWDHYQEDMFPAMKLDEDEAMVLRPMNCPHHMMIYKNKPHSYRELPIRIAEL 
35 GTMHRYEASGAVSGLQRVRGMTLNDSHIFVRPDQIKEEFKRVVNMIQDVYKDFGFEDYRF 
RLSYRDPEDKHKYFDDDEMWEKAESMLKEASDELGLTYEEAIGEAAFYGPKLDVQVKTAM 
GKEETLSTAQLDFLLPERFDLTYIGQDGEQHRPVVIHRGVVSTMERFVAFLTEETKGAFP 
TWLAPMQVEI I PVNIDLHYDYARLLQDELKSQGVRVEIDDRNEKMGYKIREAQMKKIPYQ 
IVVGDQEVENQEVNVRKYGSEKQESVEKDEFIWNVIDEIRLKKHR* 

40 

Sequence 1771 

Cont ig_0 650_pos_5 1 4 1_4 7 1 9 , 

putative peptide of unknown function 

gtgttgtttatgtataaatcaattttattagcagccgatgggtcagaaaatagtttacgt 
45 tcagcacaggaagttttgaactttatagatgaaaatactatagttactttaattacagtt 
gtaaatgttgaagaatcgaaaacagatgttttacatggtaaacaaggacatagtttaaca 
aatgaaagagaagacaaattatccagtataactgaactatttgtagaacataatgtaaat 
tatgaagtaaaaattgcacatggtcttcctgcagaaacagtggtttcagttgctaatagt 
ggtaaatatcaagcaattgttttagggtctcgtggtctaaatagtttacaagaaatggta 
50 ttgggtagcgtcagtcacaaagtggctaaacgttcaaaaattcccgttatcattgtaaaa 
tag 

Sequence 1772 

VLFMYKSILLAADGSENSLRSAQEVLNFIDENTIVTLITVVNVEESKTDVLHGKQGHSLT 
55 NEREDKLSSITELFVEHNVNYEVKIAHGLPAETWSVANSGKYQAIVLGSRGLNSLQEMV 
LGSVSHKVAKRSKIPVIIVK* 

Sequence 1773 

Contig_0650_pos_1503_1072, 
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putative peptide of unknown function 

atgaagaatatggtaatcttgaataagcaaaaaaggatgatcagaatgaaaaaagcaata 
tttagtattattatttctcttattttagttctaactgctactggatgtagtaatagttct 
aaagaaaaaccaattaaaaaaagtgcattagaaattaatcctacaagtaaagctgttaat 
5 attacagtaaataaaaaagaaaataacaaacctgaaaaaattgggaaagtgtatcgatat 
aaaaataacaatgcaaaagaaattactaacgacggtattaaaaaagatactaaagataca 
ttgatttggaaaggtgtagcaaacaaatacgataatgtaaaagatttattaggagaaagt 
attctttatgaagttaaatataaaaatggggatataaaaaaattcgagagaaaaattaaa 
tatactgaataa 

10 

Sequence 1774 

MKNMVILNKQKRMIRMKKAIFSIIISLILVLTATGCSNSSKEKPIKKSALEINPTSKAVN 
ITVNKKENNKPEKIGKVYRYKNNNAKEITNDGIKKDTKDTLIWKGVANKYDNVKDLLGES 
ILYEVKYKNGDIKKFERKIKYTE* 

15 

Sequence 1775 

Contig_06S0_pos_627_256, 

putative peptide of unknown function 

gtgtatattcattataatcacttagatcctaagcatgcgaatgatgaaaccatggatctg 
20 ttaaaattattacatttagatcaagttaaggatcatcatccttttgaaatatcaacaggt 
caaaagcgtcgtttaagtgttgcaacagcattaagttcaaaggcagagattattttacta 
gatgaaccaacattcggcctagatagtcataatacatttcaacttattaagttatttcaa 
gaacgcgttaatcaaggtcaaacaattatcatggtgacacatgatccagaaattattaaa 
cgatatccaacaagacgattacgcgtggaagatggatgtcttaaagaaatggaaggtgaa 
25 cacattgtttga 

Sequence 1776 

VYIHYNHLDPKHANDETMDLLKLLHLDQVKDHHPFEISTGQKRRLSVATALSSKAEIILL 
DEPTFGLDSHNTFQLIKLFQERVNQGQTIIMVTHDPEIIKRYPTRRLRVEDGCLKEMEGE 
30 HIV* 

Sequence 1777 

Cont ig_0 65 l_pos_l 7 9 1_2 1 05 , 

putative peptide of unknown function 

35 atgatacgtctttccattcgactgagtttgacttcgcttcttttaaaagctaatgttaat 
tcttttctaattatcgatctttctatatcgtttaatgccaaagtagcgttatattcaaca 
atgtattccttaccgacagcttcccattttatagatactgttatgataattccgagtaaa 
gggacaatggacaagattaacatagtattcgttagacctactgcagcaagaatgattgga 
aagacaaaagtacctatgatactaccagtacgacttacagattctacaaagccagtagct 

40 tgtgaacgtaaatga 

Sequence 1778 

MI RLS I RLSLTSLLLKANVNS FLI I DLS I S FNAKVALYSTMYSLPTASHFI DTVMI I PSK 
GTMDKINIVFVRPTAARMIGKTKVPMILPVRLTDSTKPVACERK* 

45 

Sequence 1779 

Con t ig_0 65 l_pos_4 67 0 0 , 

is similar to (with p-value 0.0e+00) 

>gp:gpl AJ00564 6|SAU564 6_1 Staphylococcus aureus sdrD gene. N 

50 ID: g3550593. 

atgaaaaagagaagacaaggaccaattaacaagagagtggattttctatccaacaaggta 
aacaagtactcgattaggaagttcacagtaggtacagcttcaatactcgtgggtgctacg 
ttaatgtttggtgccgcagacaatgaggctaaagcggctgaagacaatcaattagaatca 
gcttcaaaagaagaacagaaaggtagtcgtgataatgaaagctcaaaacttaatcaagtc 

55 gatttagacaacggatcacatagttctgagaaaacaacaaatgtaaacaatgcaactgaa 
gtaaaaaaagttgaagcaccaacgacaagtgacgtatctaagcctaaagctaatgaagca 
gtagtgacgaacgagtcaactaaaccaaaaacaacagaagcaccaactgttaatgaggaa 
tcaatagctgaaacacccaaaacctcaactacacaacaagattcgactgagaagaataat 
ccatctttaaaaqataatttaaattcatcctcaacgacatctaaagaaagtaaaacagac 
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gaacattct act aagcaagctcaaatgtc tact aataaatcaaatttagacacaaatgac 
tctccaactcaaagtgagaaaacttcatcacaagcaaataacgacagtacagacaatcag 
tcagcaccttctaaacaattagattcaaaaccatcagaacaaaaagtatataaaacaaaa 
tttaatgatgaacctactcaagatgttgaacacacgacaactaaattaaaaacaccttct 
5 atttcaacagatagttcagtcaatgataagcaagattacacacgaagtgctgtagctagt 
ttaggtgttgattctaatgaaacagaagcaattacaaatgcagttagagataatttagat 
ttaaaagctgcatctagagaacaaatcaatgaagcaatcattgctgaagcactaaaaaaa 
gacttttctaaccctgattatggtgtcgatacgccattagctctgaacacatctcaatca 
aaaaattcaccacataagagtgcaagtccacgcatgaatttaatgagtttagctgctgag 

10 cctaatagtggtaaaaatgtgaatgataaagttaaaatcacaaaccctacgctttcactt 
aataagagtaataatcacgctaataacgtaatatggccaacaagtaacgaacaatttaat 
ttaaaagcaaattatgaattagatgacagcataaaagagggagatacttttactattaag 
tatggtcagtatattagaccgggtggtttagaacttcctgcaataaaaactcaactacgt 
agtaaggatggctctattgtagctaatggtgtatatgataaaactacaaatacgacgact 

15 tatacatttactaactatgttgatcaatatcaaaatattacaggtagttttgatttaatt 
gcgacgcctaagagggaaacagcaattaaggataatcagaattatcctatggaagtgacg 
attgctaacgaagtagtcaaaaaagacttcattgtggattatggtaataaaaaggacaat 
acaactacagcagcggtagcaaatgtggataatgtaaataataaacataacgaagttgtt 
tatctaaaccaaaataaccaaaatcctaaatatgctaaatatttctcaacagtaaaaaat 

20 ggtaaatt tataccaggtgaagtgaaagtttacgaagtgacggataccaatgcgatggta 
gatagcttcaatcctgatttaaatagttctaatgtaaaagatgtgacaagtcaatt taca 
cctaaagtaagtgcagatggtactagagttgatatcaattttgctagaagtatggcaaat 
ggtaaaaagtatattgtaactcaagcagtgagaccaacgggaactggaaatgtttatacc 
gaatattggttaacaagagatggtactaccaatacaaatgatttttatcgtggaacgaag 

25 tctacaacggtgacttatctcaatggttcttcaacagcacagggggataatcctacatat 
agtctaggtgactatgtatggttagataaaaataaaaacggtgttcaagatgatgatgag 
aaaggtttagcaggtgtttatgttactcttaaagacagtaacaatagagaattacaacgt 
gtaactactgatcaatctggacattatcaatttgataatttacaaaatggaacgtacaca 
gtcgagtt tgcga ttcctgataattatacgccatctcccgcaaataattctacaaatgat 

30 gcaatagattcagatggtgaacgtgatggtacacgtaaagtagttgttgccaaaggaaca 
attaataatgctgataatatgactgtagatactggcttttatttaactcctaaatacaat 
gtcggagattatgtatgggaagatacaaataaagatggtatccaagatgacaatgaaaag 
ggaatttcaaatgtcaaagtgacgttaaaaaataaaaatggagataccattgggacaacg 
acaacagattcaaatggtaaatatgaattcacaggtttagagaacggggattacacaata 

35 gaatttgagacgccggaaggctacacaccgactaaacaaaactcgggaagtgacgaaggt 
aaagattcaaatggtacgaaaacaacagtcacagtcaaagatgcagataataaaacaata 
gactcaggtttctacaagccaatatataacttaggtgactatgtatgggaagatacaaat 
aaagatggtattcaagacgacagtgaaaaagggatttctggtgttaaagtgacgttaaaa 
gat aaaaatggaaatgccattgggacaacgacaacagacgcaagtggtcatt at ca at tt 

40 aaaggattagaaaatggaagctacacagttgagtttgagacaccatcaggttatacaccg 
acaaaagcgaattcaggtcaagatataactgtagattccaacggtataacaacaacaggt 
atcattaacggagctgataatctcacaattgatagtggtttctacaaaacaccaaaatat 
agtgtcggagattatgtatgggaagatacaaataaagatggtatccaagatgacaatgaa 
aaaggaatttctggtgttaaagtaacgttaaaggatgaaaaaggaaatataattagcact 

45 acaacaactgatgaaaatgggaagtatcaatttgataatttagatagtggtaattacatt 
attcattttgagaaaccggaaggcatgactcaaactacagcaaattctggaaatgatgat 
gaaaaagatgctgatggggaagatgttcgtgtaacgattactgatcatgatgactttagt 
atagataatggttattttgacgatgattcagacagtgactcagacgcagatagtgattca 
gactccgacagtgactcggacgcagacagcgattctgacgcagac 

50 

Sequence 1780 

MKKRRQGPINKRVDFLSNKVNKYSIRKFTVGTASILVGATLMFGAADNEAKAAEDNQLES 
ASKEEQKGSRDNESSKLNQVDLDNGSHSSEKTTNVNNATEVKKVEAPTTSDVSKPKANEA 
VVTNESTKPKTTEAPTVNEESIAETPKTSTTQQDSTEKNNPSLKDNLNSSSTTSKESKTD 
55 EHSTKQAQMSTNKSNLDTNDSPTQSEKTSSQANNDSTDNQSAPSKQLDSKPSEQKVYKTK 
FNDEPTQDVEHTTTKLKTPSISTDSSVNDKQDYTRSAVASLGVDSNETEAITNAVRDNLD 
LKAASREQINEAIIAEALKKDFSNPDYGVDTPLALNTSQSKNSPHKSASPRMNLMSLAAE 
PNSGKNVNDKVKITNPTLSLNKSNNHANNVIWPTSNEQFNLKANYELDDSIKEGDTFTIK 
YGQYIRPGGLELPAIKTQLRSKDGSIVANGVYDKTTNTTTYTFTNYVDQYQNITGSFDLI 
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ATPKRETAI KDNQNYPMEVT IANEWKKDFI VDYGNKKDNTTTAAVANVDNVNNKHNEVV 
YLNQNNQNPKYAKYFSTVKNGKFIPGEVKVYEVTDTNAMVDSFNPDLNSSNVKDVTSQFT 
PKVSAtX^TRVDINFARSMANGKKYIVTQAVRPTGTGNVYTEYWLTRDGTTNTNDFYRGTK 
STTVTYLNGSSTAQGDNPTYSLGDYVWLDKNKNGVQDDDEKGLAGVYVTLKDSNNRELQR 
5 VTTDQSGHYQFDNLQNGTYTVEFAIPDNYTPSPANNSTNDAIDSDGERDGTRECVVVAKGT 
INNADNMTVDTGFYLTPKYNVGDYVWEDTNKDGIQDDNEKGISNVKVTLKNKNGDTIGTT 
TTDSNGKYEFTGLENGDYTIEFETPEGYTPTKQNSGSDEGKDSNGTKTTVTVKDADNKTI 
DSGFYKPI YNLGDYVWEDTNKDGIQDDSEKGISGVKVTLKDKNGNAIGTTTTDASGHYQF 
KGLENGSYTVEFETPSGYTPTKANSGQDITVDSNGITTTGIINGADNLTIDSGFYKTPKY 
10 SVGDYVWEDTNKDGIQDDNEKGISGVKVTLKDEKGNIISTTTTDENGKYQFDNLDSGNYI 
IHFEKPEGMTQTTANSGNDDEKDADGEDVRVTITDHDDFSIDNGYFDDDSDSDSDADSDS 
DSDSDSDADSDSDAD 

Sequence 1781 
15 Contig_0651_pos_1050_601, 

is similar to (with p-value 1.0e-71) 

>gp:gp|U96619|SAU96619_2 Staphylococcus aureus NCTC 8325 Sec 
E (secE), NusG (nusG) and RplK (rplK) genes, complete cds . N 
ID: g2078375. 

20 atgaatatgactgaacaaatttttagagttgtcataccagaagaggaagaaactcaagtt 
aaggatgggaaagctaaaaagattgtgaagaaaacatttcctggatatgtattagttgag 
ttaatcatgacagatgagtcgtggtatgtagttagaaatactcctggagtaacaggattt 
gtcggatctgcaggtgcaggatcaaaacctaaccctctacttcctgaagaagtacgcttc 
attcttaagcaaatgggtcttaaagagaaaacaatagatgttgaactcgatgttggggaa 

25 caagttcgtatccaatcaggtccttttgctaatcaaattggagaagtacaagagattgaa 
gcggataaattcaagcttactgtacttgttgatatgtttggtcgtgaaacacctgtagaa 
gttgaatttgaccaaattgaaaaattataa 

Sequence 1782 

30 MNMTEQI FRWI PEEEETQVKDGKAKKI VKKTFPGYVLVELIMTDESWYVVRNTPGVTGF 
VGSAGAGSKPNPLLPEEVRFILKQMGLKEKTIDVELDVGEQVRIQSGPFANQIGEVQEIE 
ADKFKLTVLVDMFGRETPVEVEFDQIEKL* 

Sequence 1783 
35 Contig_0652_pos_ 3965_4 57 6, 

is similar to (with p-value 8.0e-25) 

>gp:gp| AL031124 | SC1C2_14 Streptomyces coelicolor cosmid 1C2 . 
NID: g3355667. 

atgagagtaaaaaaattgataagtcatcttactgtatctagaataacaagtacaagaaac 
40 gtaattattgtaataaagagagaaggggttataataatgacaaaatttaactttgatcaa 
gttcacagtgatattcagtttaaaattaaacatcttatggtgtcccaagtaaaaggaaca 
tttaagcaattcgatgttcaattagatggagatattaatgatttaacttcactaaaagca 
acagctactattattccaagttcaattgacactcaaaatgaggacagagacaaccattta 
agatcaaacgatttctttggtacagaagacaacgataaaatgacatttgtaactaaagaa 
45 attaacgaaaatcaagtagttggagatttgacaattaaaggtgaaactcatgaagagaca 
tttgatgttgaatttaatggtgtaagtaaaaatccaatgaatggacaacaagtcactggt 
tttatcgttagtggaacaattaaccgcgaaaaatatggtattaattttaaccaagcttta 
gaaactggtggcgtgatgttaggtaaaaacgtaaaatttgaagcatcagcagaatttagc 
atcgacaattaa 

50 

Sequence 1784 

MRVKKLISHLTVSRITSTRNVIIVIKREGVIIMTKFNFDQVHSDIQFKIKHLMVSQVKGT 
FKQFDVQLDGDINDLTSLKATATIIPSSIDTQNEDRDNHLRSNDFFGTEDNDKMTFVTKE 
INENQVVGDLTIKGETHEETFDVEFNGVSKNPMNGQQVTGFIVSGTINREKYGINFNQAL 
55 ETGGVMLGKNVKFEASAEFSIDN* 

Sequence 1785 

Contig_0652_pos_6095_7057, 

is similar to (with p-value O.Oe-fOO) 
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>gp:gpl U92974 |LLU92974_13 Lactococcus lactis unknown gene, p 
artial cds, and HisC (hisC), unknown, HisG (hisG), unknown, 
HisB (hisB), unknown, HisH (hish) , HisA (hisA) , HisF (hisF) , 
HisIE (hisIE), unknown, unknown, LeuA (leuA), LeuB (leuB) , 
5 LeuC (leuC), LeuD (leuD) , unknown, IlvD (ilvD) , IlvB (ilvB) , 
IlvN, IlvC (ilvC), IlvA (ilvA), AldB (aldB) and aldR (aldR) 
genes, complete cds. NID: g2565137. 
atggattatagagtactactttattataaatatgtaactatagatgaccctgaaactttt 
gcagccgaacatttgaaattttgtaaggaacatcatttaaaaggaagaatactagtttca 

10 acggaaggcattaatggaacattatctggaacaaaagaagatactgataaatatatagag 
catatgcatgcagatagtcgttttgctgatttaacttttaaaattgatgaagctgaaagt 
catgcgtttaaaaagatgcacgtgcgtccaagacgtgaaattgttgcacttgacttagaa 
gaagatattaatccacgtgaaattaccggtaaatactattctcctaaagaatttaaagcc 
gcactagaagatgaaaatactgttatattagatgctcgaaatgattatgaatacgattta 

15 ggacatttccgtggagctattcgtcctgatataacacgattccgtgacttacctgaatgg 
gtgcgtaataataaagaacaactcgacggaaaaaatattgtcacatattgtacaggtggc 
attcgttgtgaaaaattttctggttggttagtaaaagaaggatttgaaaacgtaggtcag 
ttgcatggtggtattgctacatacggtaaagaccctgaaactaaagggctatattgggat 
ggtaagatgtatgtatttgatgaacgtattagtgtcgatgtgaatcaaattgataaaaca 

20 gtcatcggcaaagagcattttgatggtacaccttgtgaacgttatattaattgtgcaaac 
cctgaatgtaataaacaaattcttgtttctgaagaaaatgaagaaaaatatttaggtgca 
tgttcgtatgattgtgcaaaacatgagcgcaatcgctacgttgcccgtcatcatattagc 
aatgaagaatggcaacgtcgtttaaataatttcaaagatgtgcctgaacacacacatgca 
taa 

25 

Sequence 1786 

MDYRVLLYYKYVTIDDPETFAAEHLKFCKEHHLKGRILVSTEGINGTLSGTKEDTDKYIE 
HMHADSRFADLTFKI DEAESHAFKKMHVRPRREI VALDLEEDINPREITGKYYSPKEFKA 
ALEDENTVILDARNDYEYDLGHFRGAIRPDITRFRDLPEWVRNNKEQLDGKNIVTYCTGG 
30 IRCEKFSGWLVKEGFENVGQLHGGIATYGKDPETKGLYWDGE<MYVFDERISVDVNQIDKT 
VIGKEHFDGTPCERYINCANPECNKQILVSEENEEKYLGACSYDCAKHERNRYVARHHIS 
NEEWQRRLNNFKDVPEHTHA* 

Sequence 1787 
35 Contig_0652jpos_8373_7876, 

putative peptide of unknown function 

gtgaatgtgcataaaatagatttatcaggcaacaaatttcaaatccaacgatttgttctg 
ttgcaaattgtattggcgctatttacaatactatttacttataaatgggcttatcaaaca 
acgcatatcattgaacaaaatcttgtcatgaatcttatttttggatttgtaggtttcgca 

40 gtactagttattttgcacgagtttattcatcgtattttgttcattatattttctaaaggt 
gaaaaaccatctttaaaatatgataaaaacaaaattattgtacagttctctcagacttgt 
tttcatcggtggcaatttacaattatcatgatagcaccacttgttatcataagtgcgacc 
ttactagcacttattcatacaggaatatcaagaggaggttgtatacctcttgctaaacca 
tcaccaaaagttaataatattgctcctaccaaacctgacatgattataacatgtatattt 

45 ttatggccgacaagttga 

Sequence 1788 

VNVHKIDLSGNKFQIQRFVLLQIVLALFTILFTYKWAYQTTHIIEQNLVMNLIFGFVGFA 
VLVILHEFIHRILFIIFSKGEKPSLKYDKNKIIVQFSQTCFHRWQFTI IMIAPLVI ISAT 
50 LLALIHTGISRGGCI PLAKPSPKVNNIAPTKPDMIITCIFLWPTS* 

Sequence 1789 

Con t ig_0 6 5 2_pos_5 6 3 3_4 8 3 6 , 
putative peptide of unknown function 
55 gtgtttgaagggaatatttttaaagttaatcccgcaacaaaagaggttacaacaaaattt 
cagtctgttaaagataatccggcagcgattaaagtacataaagatggtcgtttatttatc 
tgttatctaggtgattttaagacaactggaggcatatttgcgacaacagaaaaaggtgaa 
caaatagaagaaattatttctgatttaaatacagaatattgtattgatgacatggttttt 
gacagtaaaggcggattttatttcactgattttagagggtattctacacaacctttgggc 
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ggtgtttactatgtagatccagactttaagacggttacgccaattattcaaaatatttct 
gtggcgaatggtattgctttaagtacggatgaaaaagtgctatgggtaactgaaactaca 
actaatcgacttcaccgaatcgcattagaggatgatggcgtgactattgcaccatttgga 
gcgacaataccatattattttacaggtcatgaaggaccggattcttgttgtattgatagt 
5 aatgataatttatatgtggctatgtatggccaaggacgtgtattagttttcaataagaga 
ggttatcctataggtcaaattttaatgccaggacgtgatgatggaaagatgttacgtaca 
acacatccacaatttatacctggtacaaatcaacttataatttgtactaatgatattgaa 
aaccattctgaaggtggatctatgctttatacagttaatggttttgctaaaggatatgag 
agttatcaatttcaataa 

10 

Sequence 1790 

VFEGNIFKVNPATKEVTTKFQSVKDNPAAIKVHKDGRLFICYLGDFKTTGGIFATTEKGE 
QIEEIISDLNTEYCIDDMVFDSKGGFYFTDFRGYSTQPLGGVYYVDPDFKTVTPI IQNIS 
VANGIALSTDEKVLWVTETTTNRLHRIALEDDGVTIAPFGATIPYYFTGHEGPDSCCIDS 
15 NDNLYVAMYGQGRVLVFNKRGYPIGQILMPGRDDGKMLRTTHPQFI PGTNQLIICTNDIE 
NHSEGGSMLYTVNGFAKGYESYQFQ* 

Sequence 1791 
Contig_0652_pos_995_477, 

20 putative peptide of unknown function 

atgttagaaacacatagattaaagctagtgaagcctaatttgagttatacagatgaactt 
tatcaattgcatacaaataaggtagctacaaagtatacacctaaaggtattcatcagaat 
aaagtagcaacccaagattttattaaaggatggatgaggcattgggatgaatatcaattt 
ggttacttcattttaat tatgagagataatcacgaagtagtggggatagcgggatttgag 

25 tatcgtacaattcatcaacaacagtttcttaatgcgtattatagaatctttccatcgtat 
actggtgttggtttagcttttgagtcaatggaggagattgcccgtcatttaaaaaagcat 
gataccataacaccaaaattaattcgaacaaatcaatataatacaaattctattaaatta 
gcacaaaaactcggatataattatgatgctaactgggacgatgtaattaataaaggagat 
cgttgtttttttaacctacaagcgttggataataactaa 

30 

Sequence 1792 

MLETHRLKLVKPNLSYTDELYQLHTNKVATKYTPKGIHQNKVATQDFIKGWMRHWDEYQF 
GYFILIMRDNHEVVGIAGFEYRTIHQQQFLNAYYRIFPSYTGVGLAFESMEEIARHLKKH 
DT I T P KL I RTNQ Y NTN S I KLAQKLG YN Y DANW DDV I N KG DRC FFN LQ AL DN N * 

35 

Sequence 1793 

Contig_0 653_pos_4 04 8_0 , 

is similar to (with p-value 1.0e-21) 

>sp:spj P37 965|GLPQ_BACSU GLYCEROPHOSPHORYL DIESTER PHOSPHODI 

40 ESTERASE (EC 3.1.4.4 6) (GLYCERO PHOSPHODI ESTER PHOSPHODIESTER 
ASE) . >pir :pir | S37251 1 S37251 glycerophosphoryl diester phosp 
hodiesterase - Bacillus subtilis >gp: gp | Z26522 | BSGLPTQ_2 B.s 
ubtilis glpT and glpQ genes for glycerol 3-phosphate permeas 
e and glycerophosphoryl diester phosphodiesterase, NID: g403 

45 371. >gp:gp| Z99105 I BSUB0002_42 Bacillus subtilis complete ge 
nome (section 2 of 21): from 194651 to 415810. NID: g2632457 
• >gp : gp I AB006424 |AB006424_43 Bacillus subtilis genomic DNA, 
70 kb region between 17 and 23 degree. NID: g3599592. 
gtgactttgatgaaatttcgacgtccaaatcaacatttccaaatcgtagcgcatagagga 

50 ttacctgaagattatcctgaaaatactattatcgcttatcgacatgcgctcatgttacat 
atagatatgttggaaat tgaLgtacattacacaaaagataaagaacttgtcgttatacat 
gatgatactatcgatcgtacgtcaaatggtaaaggtaaggtttctgattttactt taaaa 
gaattaaaagcgttagattttggtttctataaaggagagaaatttaaaggggagagtata 
ccgacttttgatgaagtgttagatttagcagataacttttcacaaaaattattaatagaa 

55 ataaaaaagcctagtcagtatccaaatattgaaaatatgattgttgataaattgaaggaa 
agacaaatatctaaatctaaagtgattttacaatcattcgattttgattgtgtaaaaaaa 
ttgtcagcaatgaatttagattatgaattaggtttattaattagtaagaaaaaatattgg 
cacaagttaccaaatttcaaaaaaattgccaaagttgctgattatgctaatcctaattat 
caaattg 
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Sequence 1794 

VTLMKFRRPNQHFQI VAHRGLPEDYPENTIIAYRHALMLHlDMLEIDVHYTKDKELrVVIH 
DDTIDRTSNGKGKVSDFTLKELKALDFGFYKGEKFKGESI PTFDEVLDLADNFSQKLLIE 
5 IKKPSQYPNIENMIVDKLKERQISKSKVILQSFDFDCVKKLSAMNLDYELGLLISKKKYW 
HKLPNFKKIAKVADYANPNYQIX 

Sequence 1795 
Contig_0653_pos_3 981_3541, 

10 putative peptide of unknown function 

atggttagacatgattttaaagttaaaactgaatggctaggtggacgcgaagaagtaggt 
aaacttcgaggagatattatcaatgaaaatatatccatcccctcttcactcggtggccaa 
ggagaaggcacaaatccagatgaattactagtaagtgcagcatcatcttgttacatcatt 
tcactcgctgcaacactagaaaagagtggttttactaatgtaaaaattaatcaacagtca 

15 ataggtacagcctcgtttgaaaataaaaaatttaaaatggaacgtattacacattatccc 
tcaattaaagtaccctcttctcaaacagaaaagcttaaaagtattttagataaattatta 
gtgattgcagataataattgtatgatatctaatgcaatacggaataatgtcattatttca 
attgaacctaatttaatataa 

20 Sequence 1796 

MVRHDFKVKTEWLGGREEVGKLRGDIINENISIPSSLGGQGEGTNPDELLVSAASSCYII 
SLAATLEKSGFTNVKINQQSIGTASFENKKFKMERITHYPSIKVPSSQTEKLKSILDKLL 
VI ADNNCMI SNAI RNNVIISIEPNLI* 

25 Sequence 1797 

Cont i g_0 653_pos_2 1 4 2_667 , 

is similar to (with p-value 0.0e+00) 

>gp:gpl Z99116 I BSUB0013_19 Bacillus subtilis complete genome 
(section 13 of 21): from 2395261 to 2613730. NID: g2634723. 

30 >gp: gp| L4 764 8 | BACSERA_1 Bacillus subtilis phosphoglycerate d 
ehydrogenase (serA) , ypaA, ferredoxin (fer), ypbB, recS, ypb 
D, ypbE, ypbF, ypbG, ypbH, glutamate dehydrogenase (ypcA) , y 
pdA, ypdB, ypdC, spore cortex lytic enzyme (sleB), ypeB, ypf 
A, ypfB, cytidine monophosphate kinase (cmk), ypfD, ypgA, yp 

35 hA, yphB, yphC, NAD+ dependent glycerol-3-phosphate dehydrog 
enase (glyc) , yphE and yphF genes, complete cds. NID: gll461 
95. >gp: gp| L4 7 64 8 I BACSERA_1 Bacillus subtilis phosphoglycera 
te dehydrogenase (serA), ypaA, ferredoxin (fer), ypbB, recS, 
ypbD, ypbE, ypbF, ypbG, ypbH, glutamate dehydrogenase (ypcA 

40 ), ypdA, ypdB, ypdC, spore cortex lytic enzyme (sleB), ypeB, 
ypfA, ypfB, cytidine monophosphate kinase (crak) , ypfD, ypgA 
, yphA, yphB, yphC, NAD+ dependent glycerol-3-phosphate dehy 
drogenase (glyc), yphE and yphF genes, complete' cds. NID: gl 
146195. 

45 atgatttcaacttatgatgctcttatcgtacgaagtcaaacccaagtaacagagcgaatt 
attaatgctgcaacaaatttgaaggtcattgcaagagctggtgtaggtgtggataatatt 
aatatagaagcagcgactttaaaaggtattttagtaattaatgctcctgatggtaataca 
atttctgctacagaacattcagtagctatgttgcttgcaatggcacgaaatattcctcaa 
gcacaccaatctttacgtaacaaagaatggaatcgtaaagcatttagaggggttgaactt 

50 tatggcaaaaccttaggtgttatcggtgctggtaggattggtttgggcgtcgctaaacgt 
gcgcagagtttcggtatgaaaattttagcgttcgatccttatttaacagaagataaagcg 
aagtcattagatattcaaattgcaactgttgatgaaattgccgaaaaatccgactttgta 
acagttcacacaccattaacacctaaaactcgaggaattgttggttcatctttctttaac 
aaagctaaacaaaacttacaaatcataaatgttgccagagggggtattatagatgaaact 

55 gcacttattgaagcattagataataacttaatagatcgtgcagctattgacgtatttgaa 
catgaacctcctactgattcccctctcattcaacatgataaaattattgtcacaccacat 
cttggcgcctctactgtagaagcgcaagagaaggttgcagtctctgtatctgaagaaata 
attgaaattctaactaaagggaatgttgagcatgctgtgaatgctccaaaaatggattta 
agcaaagttgataaaacaactcaaagctttataggtttaagtacaactattggtgagttt 
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gctattcagcttctcgatggtgctccgagtgaaattaaagttaaatatgctggtgactta 
gcgcaaaatgacactagtttaattacaagaacaattataacgaacatcttgaaagaagat 
ttaggtaatgaagtcaatattattaatgcattagcaatacttaaccaacaaggtgtcacg 
tataatatagaaaaacaaaagaaacattctggctttagtagttacattgagctagaacta 

5 gttaatgatcaagataaaatcaaaattggcgeaacggtattcgcaggttttggcccaaga 
atagtacgtattaatgattactcacttgattttaaacctaaccaatatcaattagtaaca 
tgtcataaagataaacctggtatagtaggacaaacaggcaacctattgggaagtcacgga 
attaatattgcgtcaatgactttaggacgtaacgatgctggtggagatgctttaatgatt 
ctttctattgatcaacaagcaagtgaggaagttataaaaattttaaatgaaacaagcgga 

10 ttcaacaaaattattagcactaagttaacaatttga 

Sequence 1798 

MISTYDALIVRSQTQVTERI INAATNLKVIARAGVGVDNINIEAATLKGILVINAPDGNT 
ISATEHSVAMLLAMARNIPQAHQSLRNKEWNRKAFRGVELYGKTLGVIGAGRIGLGVAKR 

15 AQSFGMKILAFDPYLTEDKAKSLDIQIATVDEIAEKSDFVTVHTPLTPKTRGIVGSSFFN 
KAKQNLQI INVARGGII DETALIEALDNNLIDRAAIDVFEHEPPTDSPLIQHDKI I VTPH 
LGASTVEAQEKVAVSVSEEI IEILTKGNVEHAVNAPKMDLSKVDKTTQSFIGLSTTIGEF 
AIQLLDGAPS E I KVKYAG DLAQNDTSL I TRT IITNI LKEDLGNEVN 1 1 NALAI LNQQGVT 
YNIEKQKKHSGFSSYIELELVNDQDKIKIGATVFAGFGPRIVRINDYSLDFKPNQYQLVT 

20 CHKDKPGIVGQTGNLLGSHGINIASMTLGRNDAGGDALMILSIDQQASEEVIKILNETSG 
FNKIISTKLTI* 

Sequence 1799 
Contig_0657_pos_695_1033, 

25 putative peptide of unknown function 

atgctaggatttgcagggggattgggatacagtcattataaagattcaaaatcgaacact 
gatgtagcttcaaaagagactcagacttccaataaaaacactcatgaagatacaacttca 
caaggtaaaatgcaaaatcaagttaatagccaaacaaacgaagtatcaaatgggacatca 
actaaaacacttagtgaaaaagcaaagcagt taagagaagcttttaacgtcaatgatgag 

30 gaagctcaaattttagcagatgaaatcgatagagcagatgtaaataaagatggcacgatt 
acaacggatgaaatgacgcctacttattttcatatataa 

Sequence 1800 

MLGFAGGLGYSHYKDSKSNTDVASKETQTSNKNTHEDTTSQGKMQNQVNSQTNEVSNGTS 
35 TKTLSEKAKQLREAFNVNDEEAQILADEIDRADVNKDGTITTDEMTPTYFHI* 

Sequence 1801 

Con t ig_0 65 7_pos_2 04 1_2 54 4, 

putative peptide of unknown function 

40 atggtacttaggacttctttttttaaattcctcaaattccttcatttcttcctcattcat 
gtcattaaatcgattgatcacttcacctgtacttttgtttactacagcacaaaat tttat 
aggagttccggctgcatctcttacaaggtgttcaacataatattcattagcattgcttct 
atctatatttgttttaaattcaagaacttgttgcttattaatccctttattaatgtaata 
gttcgagacaatttgttctgcttcttgagcggaaatttgtttttgagctgttttttgttt 

45 ttgattactattagaatttgttacgctcccattagtttgcatattatccgttttattttg 
ttcttcttttggattgctttctgtttttttttacattttcttgtccacatgctgttaata 
ttattgacgagagtgctaatgttcctaatagtttcaatttcattttatatccctccgttt 
aaaatgttgttaaagttcacctaa 

50 Sequence 1802 

MVLRTSFFKFLKFLHFFLIHVIKSIDHFTCTFVYYSTKFYRSSGCISYKVFNIIFISIAS 
IYICFKFKNLLLINPFINVIVRDNLFCFLSGNLFLSCFLFLITIRICYAPISLHI IRFIL 
FFFWIAFCFFLHFLVHMLLILLTRVLMFLIVSISFYIPPFKMLLKFT* 

55 Sequence 1803 

Con t ig_0 6 5 7_pos_5 1 1 8_0 , 

putative peptide of unknown function 

gtgcctatcaaaattaataatccagtggtagttaacacgagtttagagtgtaaggataat 
tttctaaaacttttggcgttccacaaatcaaccacgactaaatgtcccaaacctcccaaa 
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atgataagtattggaatagtgataatgattaccggatcatttgaaaaatcgattaagttg 
tttttaaaaagggcgaatcctgcgttgttaaatgcggaaactgaagtgaataaacttaaa 
aatagacctttacctatgccaaattttggaataaacgataaacatagacaaagtgtacca 
aataattcagtggcgatgctgtatatggctagatgtttaataagt 

5 

Sequence 1804 

VPIKINNPVWNTSLECKDNFLKLLAFHKSTTTKCPKPPKMISIGIVIMITGSFEKSIKL 
FLKRANPALLNAETEVNKLKNRPLPMPNFGINDKHRQSVPNNSVAMLYMARCLIS 

10 Sequence 1805 

Cont ig_0657_pos_2384_l 962, 

putative peptide of unknown function 

atgcaaactaatgggagcgtaacaaattctaatagtaatcaaaaacaaaaaacagctcaa 
aaacaaatttccgctcaagaagcagaacaaattgtctcgaactattacattaataaaggg 

15 attaataagcaacaagttcttgaatttaaaacaaatatagatagaagcaatgctaatgaa 
tattatgttgaacaccttgtaagagatgcagccggaactcctataaaattttgtgctgta 
gtaaacaaaagtacaggtgaagtgatcaatcgatttaatgacatgaatgaggaagaaatg 
aaggaatttgaggaatttaaaaaaagaagtcctaagtaccataattctggtgaagaaaaa 
aataaaatgcaagaaaattcttcatcttccgaacaacaagaagtccacaattctgttata 

20 taa 

Sequence 1806 

MQTNGSVTNSNSNQKQKTAQKQ I SAQEAEQI VSN Y Y I NKG I NKQQVLE FKTN I DRSNANE 
YYVEHLVRDAAGTPIKFCAVVNKSTGEVINRFNDMNEEEMKEFEEFKKRSPKYHNSGEEK 
25 NKMQENSSSSEQQEVHNSVI * 

Sequence 1807 

Cont ig_0 658_pos_2 92_8 3 1 , 

is similar to {with p-value 2.0e-18) 
30 >sp:sp|P37811 |ATPD_BACSU ATP SYNTHASE DELTA CHAIN (EC 3.6.1. 

34). >pir:pir|S39253|S39253 H+- transporting ATP synthase (EC 
3.6.1.34) delta chain - Bacillus subtilis >gp:gp I Z28592 | BSA 

TPASE_5 B . subtilis (168) atpase genes for ATP synthase subun 

its i, a, c ,b , delta, alpha, gamma, beta, epsilon. NID: g4 
35 33983. >gp: gp I Z99122 | BSUB0019_181 Bacillus subtilis complete 
genome (section 19 of 21): from 3597091 to 3809700. NID: g2 

636029. 

atggcaaaggtagcaaaaaaatatgccaaagcattatttgatgtcgctctagatacaaat 
caactagatgttgtctatgaagatttagaaacaattagccattcatcgtttgatttcatc 

40 aaacaacttaaagcaattgatagtaatccaagcttaactgcaaatcaacgtgaagaattt 
gtagaaagagtttacaacgaagcaaatccatatgtggtaaatactttaaaagtattagca 
gataaccgacatatttcaattgtagagaatgtttttaaatcattccaaaatttatataac 
aaatactacaaacaagattttgcaattattgaatcgacttacgagttaagcgaagatgaa 
atatcaagaattgtagaacttatcaaaaagcaaactgaattatcaaatgtaattgttaac 

45 actaaaatcaatcaagatttaattggtggatttagagttaaggttggaactacagttatg 
gatggtagtgttagaaatgaccttgttcaattacaaagaaaatttgaaagagctaactaa 



Sequence 1808 

50 MAKVAKKYAKALFDVALDTNQLDVVYEDLETISHSSFDFIKQLKAIDSNPSLTANQREEF 
VERVYNEANPYVVNTLKVLADNRHISI VENVFKSFQNLYNKYYKQDFAI IESTYELSEDE 
ISRIVELIKKQTELSNVIVNTKINQDLIGGFRVKVGTTVMDGSVRNDLVQLQRKFERAN* 



55 Sequence 1809 

Cont ig_0658_pos_854_2 365, 

is similar to (with p-value 0.0e+00) 

>sp:sp| P17674 I ATPA_BACME ATP SYNTHASE ALPHA CHAIN (EC 3.6.1. 
34). >pir:pir | F31482 | F31482 H+-transport ing ATP synthase (EC 
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3.6.1.34) alpha chain - Bacillus megaterium >gp : gp I M20255 | B 
ACATPA_7 B. megaterium ATP synthase i, a, c, b, delta, alpha, gamma 
,beta and epsilon subunit genes, complete cds, and ORF. NID: 

gl42553. 

5 atggccataaaagctgaagaaatcagtgcattgcttcgctcacaaattgaaaattatgag 
tcagaaatgtccgttacagatgttggtacagtactccaaattggtgacggtatcgcatta 
attcacggacttaacgacgttatggctggtgagctagtagaattccataacggtgttctt 
ggtttagcacaaaaccttgaagaatctaatgtgggtgtggttattttaggaccatatgaa 
gaaattagtgaaggtgacgaagttaaacgtactggccgaattatggaagtaccagtcgga 

10 gaggaaatgataggaagagttgttaatcctcttggacaacccattgacggacaaggtcca 
atcaatgcgactaaaactcgtcctgtagagaaaaaagcaactggcgtaatggatcgtaaa 
tctgtagatgaaccattacaaacaggtatcaaagcaattgatgctttagtaccaattggc 
cgtggtcaacgtgaattaatcattggtgaccgtcaaactggtaaaacaactgttgcaatt 
gattcaatcttaaaccaaaaagatcaagatacaatttgtatttatgttgcaataggtcaa 

15 aaagattcaacagttcgtgcaaatgttgaaaaattaagacaagcaggtgctttagactac 
acaatcgttgtatctgcatccgcagctgatccagcaccattactttatattgcaccttat 
tctggtgtaactatgggtgaagagttcatgtttaatggaaaacatgttcttatcgtttac 
gatgatttaactaaacaagcggcagcataccgtgagctatcattattattacgtagacca 
ccaggtcgtgaagcatat cctggggacgtgttctacttacacagtagattattagaaaga 

20 gctgcaaaact taacgatgatcttggaggcggttcaat tact get ttaccaatcattgaa 
actcaagctggcgatatctcagcatacgttccaacaaatgttatctcaattactgacgga 
caaatattcttacaatctgatttattcttctcaggtgttagaccagcgattaatgctggg 
caatcagtatctcgtgttggtggttcagctcaaattaaagcgatgaaaaaagttgcagga 
acattacgtttagacttagcttcatatcgtgagttagaatcatttgcgcaatttggttct 

25 gatttagatgaatttacagctaaaaaattagcgcgtggtgaacgtactgttgaagtatta 
aaacaaggtcaaaataacccactgcctgtagaacatcaagtacttattatttttgcttta 
actaaaggttacttagatgatattcctgtccaagatatcaatcgttttgaagaggaattt 
aaccactgggctgagtcaaatgcaactgaattattaaatgaaattagagaaactggtgct 
ttaccagatgctgataaatttgattctgctatcacagaatttaaaaaaggatttaataaa 

30 tcagaagaataa 

Sequence 1810 

MAIKAEEISALLRSQIENYESEMSVTDVGTVLQIGDGIALIHGLNDVMAGELVEFHNGVL 
.GLAQNLEESNVGVVILGPYEEISEGDEVKRTGRIMEVPVGEEMIGRVVNPLGQPIDGQGP 

35 INATKTRPVEKKATGVMDRKS VDEPLQTGIKAI DALVPIGRGQRELI 1GDRQTGKTTVAI 
DSILNQKDQDTICIYVAIGQKDSTVRANVEKLRQAGALDYTI VVSASAADPAPLLYIAPY 
SGVTMGEEFMFNGKHVLIVYDDLTKQAAAYRELSLLLRRPPGREAYPGDVFYLHSRLLER 
AAKLNDDLGGGS I TALPI IETQAGDI S AYVPTNVI S ITDGQI FLQSDLFFSGVRPAI NAG 
QSVSRVGGSAQIKAMKKVAGTLRLDLASYRELESFAQFGSDLDEFTAKKLARGERTVEVL 

40 KQGQNNPLPVEHQVLIIFALTKGYLDDIPVQDINRFEEEFNHWAESNATELLNEIRETGA 
LPDADKFDSAITEFKKGFNKSEE* 

Sequence 1811 

Contig_0658_pos_24 98^3316, 
45 is similar to (with p-value 3.0e-78) 

>sp : sp | P20602 | ATPG_BACME ATP SYNTHASE GAMMA CHAIN (EC 3.6.1. 
34). >pir:pir IG31482 IG31482 H+-transporting ATP synthase (EC 
3.6.1.34) gamma chain - Bacillus megaterium >gp : gp I M20255 I B 
ACATPA_8 B. megaterium ATP synthase i, a, c, b, delta, alpha, gamma 
50 ,beta and epsilon subunit genes, complete cds, and ORF. NID: 
gl42553. 

atgaagcaaataacaaaggcgatgaacatggtttctagttcaaaattacgtagagctgag 
aaaaatactaaatcatttagaccttatatggaaaagatgcaagatgctattacagctgta 
gctggttcaaatag tact tctaatcatccaatgcttaaatctagaga tat taaaagaagt 
55 gg t tact t ag t aa tcact agtgataaaggcttagccggtgcc tat agtacaaatgtt tta 
aaaagcttagtaaacgatatcaattctaaacccaacgacagtagtgaatatagtctaatc 
gttttaggtcagcaaggtgtagatttcttcaaacatagaggatatgaaattgaaagttct 
ttagttgaagttccagatcaaccttcatttaaatctattcaatctatagctaaacatgct 
attgatttatttagcgaggaaaacatagatgaattgactatttattacagtcattatgtt 
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agtgtcttagaaaataaacctgcaactaaacaagttttaccattatctcaagaagattca 
ggtcaaggacatggtcaaatgtcttcatacgaatttgaaccagataaagaatctatt tta 
agcgttattttgccacaatacgttgaaagcttaatctacggtacaatcttagatgcaaaa 
gctagtgaacatgcttcacgtatgacagcaatgagaaatgcttcagataatgcgacagaa 
5 ctaatcgatgatttatcattagaatacaatagagcgagacaagctgcgattactcaacaa 
attactgaaattgttggtggatcatcagctcttgagtaa 

Sequence 1812 

MKQITKAMNMVSSSKLRRAEKNTKSFRPYMEKMQDAITAVAGSNSTSNHPMLKSRDIKRS 
10 GYLVITSDKGLAGAYSTNVLKSLVNDINSKPNDSSEYSLIVLGQQGVDFFKHRGYEIESS 
LVEVPDQPSFKSIQSIAKHAIDLFSEENIDELTIYYSHYVSVLENKPATKQVLPLSQEDS 
GQGHGQMSSYEFEPDKESILSVILPQYVESLI YGTILDAKASEHASRMTAMRNASDNATE 
LIDDLSLEYNRARQAAITQQITEIVGGSSALE* 

15 Sequence 1813 

Contig_0658_pos_3338_4 750, 

is similar to (with p-value 0.0e+00) 

>Sp:sp|P3780 9|ATPB_BACSU ATP SYNTHASE BETA CHAIN (EC 3.6.1.3 
4). >pir:pir I S39256 I S39256 H+-transporting ATP synthase (EC 

20 3.6.1.34) beta chain - Bacillus subtilis >gp : gp I Z28592 | BSATP 
ASE_8 B. subtilis (168) atpase genes for ATP synthase subunit 
s i, a, c ,b , delta, alpha, gamma, beta, epsilon. NID: g433 
983. >gp:gp| Z99122 I BSUB0019_178 Bacillus subtilis complete g 
enome (section 19 of 21): from 3597091 to 3809700. NID: g263 

25 6029. 

atgggaattggccgtgtaactcaagttatgggtccagtaatagatgttcgttttgaacat 
aacgaagtacctgagataaataatgcattacacatcgaagttcctaaagaagatggagcg 
cttcaattaacattagaagttgcacttcaactaggtgacgatgtagttcgtacaattgca 
atggactcaactgacggcgttcaaagaggaatggaagttaaagatacaggtagagacata 

30 agtgtacctgtcggtgacgtaactct aggaagagtgtttaacgtactaggagaaactatt 
gacttagatgaaaaaattgatgattcagtacgacgtgaccctatccatagacaagctcca 
ggattcgacgaattatcaacaaaagtagaaatcttagaaactggtattaaagtagtagac 
ttattagcaccttacataaaaggtggtaaaattggattatttggtggtgccggtgtaggt 
aaaaccgtactaatccaagaacttattaataacatcgctcaagaacacggtggtatctca 

35 gtattcgctggtgttggtgaacgtacacgtgaaggtaatgatctttactatgaaatgagt 
gacagtggtgttatcaagaaaactgcaatggtctttggtcaaatgaatgagccacctggt 
gcacgtatgcgtgtagcattatccggattaacaatggccgaatatttccgagatgaagaa 
ggccaagatgtgttattattcattgataacattttcagattcactcaagctggttcagaa 
gtttctgcgttattaggtcgtatgccatcagctgttggttatcaacctacacttgctaca 

40 gaaatgggtcaattacaagaacgtataagttcaacaaataaaggttcagttacatcaatt 
caagctgttttcgtaccagccgatgactatactgaccctgcgccagcaacaacgttcgca 
cacttagattcaacaacaaacttagagcgtaaattaacagaaatgggtatttatccagct 
gtagacccgcttgcttctacatctagagctttggaaccttcagtagtaggtcaagagcat 
tatgatgtggcacgtgaagttcaatctactttacaaaaatatagagagttacaagatatt 

45 attgcgattcttggtatggatgaattatcagatgaagataaacaaactgtggaacgagca 
cgtagaattcaattcttcttatcacaaaacttccacgttgcagaacaatttactggacaa 
aaaggttcatatgtacctgttaaaacaacagttgcagacttcagagatattttagatggt 
aagtatgaccatattcctgaagacgcattccgtttagtaggtagcatggaagacgtaatt 
gagaaagcaaaagatatgggtgttgaagtctaa 

50 

Sequence 1814 

MGIGRVTQVMGPVIDVRFEHNEVPEINNALHIEVPKEDGALQLTLEVALQLGDDVVRTIA 
MDSTDGVQRGMEVKDTGRDISVPVGDVTLGRVFNVLGETIDLDEKIDDSVRRDPIHRQAP 
GFDELSTKVEILETGIKWDLLAPYIKGGKIGLFGGAGVGKTVLIQELINNIAQEHGGIS 
55 VFAGVGERTREGNDLYYEMSDSGVIKKTAMVFGQMNEPPGARMRVALSGLTMAEYFRDEE 
GQDVLLFIDNIFRFTQAGSEVSALLGRMPSAVGYQPTLATEMGQLQERISSTNKGSVTSI 
QAVFVPADDYTDPAPATTFAHLDSTTNLERKLTEMGI YPAVDPLASTSRALEPSVVGQEH 
YDVAREVQSTLQKYRELQDIIAILGMDELSDEDKQTVERARRIQFFLSQNFHVAEQFTGQ 
KGSYVPVKTTVADFRDILDGKYDHI PEDAFRLVGSMEDVIEKAKDMGVEV* 
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Sequence 1815 
Contig_0658_pos_5233_5556, 
putative peptide of unknown function 
5 atgttaaaaagtaatatccatttttataagtatctttattgcctcttttcgtgtacaatg 
ttatacagtaatcttacgaaggagcttaatatggattatcttggtcagttcgcaattgtt 
catttaatcttacatgttgtttgtatttgcgtagcttattgggctttaaattccataaaa 
ttagaccaattttttaaaaagggctacccattacaagttcaagtttgtatgatttttatt 
tctattttactgggtacggcagtcagtaactttatagttgatttattgcaatattcaact 
10 caagtgaaatacttgataaaataa 

Sequence 1816 

MLKSNIHFYKYLYCLFSCTMLYSNLTKELNMDYLGQFAIVHLILHVVCICVAYWALNSIK 
LDQFFKKGYPLQVQVCMIFISILLGTAVSNFIVDLLQYSTQVKYLIK* 

15 

Sequence 1817 

Contig_0658_pos_5666_6931, 

is similar to (with p-value 0.0e+00) 

>gp:gp| Z81356 | BSATPC_4 B.subtilis atpC gene. NID: gl648848. 

20 >gp:gp| Z99122 |BSUB0019_173 Bacillus subtilis complete genome 
(section 19 of 21): from 3597091 to 3809700. NID: g2636029. 
atggataaaatagtaataaatggtggaaatcgt ttaacaggtgaagttaatgttgaagga 
gctaaaaatgctgtattacctgtacttacagcgtcattacttgcttctgaaggacacagt 
aaactagttaatgtcccagagttaagcgatgttgaaacaatcaataatgtat tat ctaca 

25 ctcaatgcaaatgtagagtatgataaagataaaaatgcagttaaagttgatgcaactaaa 
actttaaatgaagaagccccttatgaatatgtgagtaaaatgcgagcaagcatcttggtt 
atgggaccactacttgctcgactagggcatgctattgttgctttgccaggaggatgtgcg 
attggaacacgtcctatagagcagcacattaaaggatttgaagctttaggtgcagacatt 
catttagaaaatggaaatatttatgcaaatgctaaagatggattaaagggagcacatatt 

30 catttagattttccgagtgtaggtgcaactcaaaatattattatggcagcgtccttggca 
tcaggaaaatctatcattgagaatgtcgctaaagaacctgaaattgttgatttagctaat 
tatattaatgaaatgggcggtaaaattacaggtgcaggtactgataccattacgatacat 
ggtgttgaaaaactttacggtgttgaacatgcaattataccagatagaat tgaagctgga 
acgttacttatcgcaggtgcaataactcgtggtgacatatttgtacgtggtgcaattaaa 

35 gaacatatggctagtttaatatataagcttgaagagatgggggtagatcttgaatattat 
gaagaaggtataagagttacagcaaatggagatttaaatccagtagatgtaaaaacttta 
ccacacccaggtttcccaactgatatgcaatctcaaatgatggctttatta ttaacagca 
aatggacacaaagtgattactgagactgtttttgaaaatagatttatgcacgttgcagaa 
tttagaagaatgaatgcgaatataagtgttgaaggaagaagtgctaaaattgaaggaaaa 

40 agccatttacaaggtgctcaagttaaagcaacagatttaagagctgctgcagccttaatc 
ttagctggtttagttgcagagggaactacgcaagtgactgagttaaagcatctagataga 
ggatacgtcaatttacatggaaaactaaaaagtctaggtgcaaacatagaacgtgtaaat 
cgataa 

45 Sequence 1818 

MDKIVINGGNRLTGEVNVEGAKNAVLPVLTASLIASEGHSKLVNVPELSDVET1NNVLST 
LNANVEYDKDKNAVKV DATKTLNEEAPYEYVSKMRASILVMGPLLARLGHAIVALPGGCA 
IGTRPIEQHIKGFEALGADIHLENGNIYANAKDGLKGAHIHLDFPSVGATQNIIMAASLA 
SGKSIIENVAKEPEIVDLANYINEMGGKITGAGTDTITIHGVEKLYGVEHAII PDRIEAG 

50 TLLIAGAITRGDI FVRGAIKEHMASLI YKLEEMGVDLEYYEEGIRVTANGDLNPV DVKTL 
PHPGFPTDMQSQMMALLLTANGHKVITETVFENRFMHVAEFRRMNANISVEGRSAKIEGK 
SHLQGAQVECATDLRAAAALILAGLVAEGTTQVTELKHLDRGYVNLHGKLKSLGANIERVN 
R* 

55 Sequence 1819 

Con t ig_0 65 9_pos_2 9 5 1_4 2 1 6 , 

is similar to (with p-value 0.0e+00) 

>sp:sp| P00952|SYY_BACST TYROSYL-TRNA SYNTHETASE (EC 6.1.1.1) 
(TYROSINE— TRNA LIGASE) (TYRRS) . >pir :pir I A0117 9 I SYBSYF tyr 
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osine — tRNA ligase (EC 6.1.1.1) - Bacillus stearothermophilu 
s 

atggctaatgctttaatagaagatttaaaatggagagggttaatttatcaacaaacagat 
gaagagggtatcgaagaattattaaataaggaacaggtcactttatattgcggcgcagat 
5 cctacagctgatagtttacatattggtcact tgttaccttttttaacattaagacgcttc 
caagaacatgggcatcgtcctatcgtcttaattggtggaggtactggtatgataggagat 
ccttctggaaagtctgaagagcgcgtgttacaaacagaatcacaagttgaagctaacgtc 
aaaggcctgtctaatcaaatgcatcgattatttgaatttggcagtgataaaggggcaaaa 
ttagttaataataaagattggcttggtcaaatttcgttgattagttttcttagagattat 

10 ggtaaacatgttggcgttaactatatgctaggaaaagattctattcaaacacgtttagaa 
catggtatctcttatacagaatttacttatactattttacaagctattgattttggctat 
ttaaatcgtgagttaaattgtaaaattcaagtaggcggatctgatcaatggggtaatatt 
acaagtggtattgaattaatgcgacgaatgtatggacaaactgaggcatatggcctaaca 
atcccattagtaactaaatcagatgggaagaaatttggtaaatcagag tctggagctgtg 

15 tggttagatcctgaaaagacaagtccatatgaattttatcaattttggattaatcaatct 
gacgaagatgtaattaaattcttaaaatattttacttttttagaaaaagaagaaattaat 
cgattagaacaatcaaaaaatgaagcgcctcacttacgtgaagcacagaaagcactagcg 
gaaaatgttacgaaatttattcatggtgaagcagctttaaaagacgctatacgtatttca 
aaagcactgtttagtggagat ttaaaatca t tatctgctaaagaacttaaagaagggt tt 

20 aaagacgtacctcaagtaacgctatctacaaaaacgacaaatatagttgaagcacttatt 
gaaacaggtattgcttcatctaaacgccaagcacgtgaagatgtaaacaacggtgcaata 
tatattaatggtgaacgtcaacaatcagtcgattatgagttaagtagtaaagaccttatt 
gaagatgaaattacaataattcgacgaggaaagaaaaaatatttcatggttaattaccaa 
tcataa 

25 

Sequence 1820 

MANALIEDLKWRGLIYQQTDEEGIEELLNKEQVTLYCGADPTADSLHIGHLLPFLTLRRF 
QEHGHRPIVLIGGGTGMIGDPSGKSEERVLQTESQVEANVKGL3NQMHRLFEFGSDKGAK 
LVNNKDWLGQISLISFLRDYGKHVGVNYMLGKDSIQTRLEHGISYTEFTYTILQAIDFGY 
30 LNRELNCKIQVGGSDQWGNITSGIELMRRMYGQTEAYGLTIPLVTKSDGKKFGKSESGAV 
WLDPEKTSPYEFYQFWINQSDEDVIKFLKYFTFLEKEEINRLEQSKNEAPHLREAQKALA 
ENVTKFIHGEAALKDAIRISKALFSGDLKSLSAKELKEGFKDVPQVTLSTKTTNIVEALI 
ETGIASSKRQAREDVNNGAIYINGERQQSVDYELSSKDLIEDEITIIRRGKKKYFMVNYQ 
S* 

35 

Sequence 1821 

Contig_0659_pos_6502_584 0, 

putative peptide of unknown function 

atgccttcagaattatggaatcgcaacaaagttaaagatttttttgggaaaatcacaact 
40 tttttagatggcccttatgaaaataaggtagccagaaattttaccaaaggacagtttacg 
gatgatactgcacaatcactactaatcattgatgcgctaaataaaaatcattttgaacct 
tcaaaaaaaatcatagcagacgaattgattgaatgggccaatgctacaaacgcatttaaa 
aataatattcttggcccaagt tctaaagcagctt taactgcaataatacaaggagaggat 
tcccaattatatacaaaaaatgcattaactaatgggtcggctatgagaattgctcctatt 
45 ggtactttattttctaaagatcaaaaagtagagcttgtaaattatgttaaggaaattagt 
gaagctactcacacaagtgatgttgcaatagctggagctagtatgattgcttatgcagta 
acattggcagcggaagataagaattggaaagaaattatagaaggcgttttggatattcac 
gatatagctattaaagaaggagaacaaactttttctgcatcaatagcagaaagattaaaa 
ttagcagttcagtttgccaatcatttcgaaagtgaggaggaatatgttttctttagtcat 
50 tag 

Sequence 1822 

MPSELWNRNKVKDFFGKITTFLDGPYENKVARNFTKGQFTDDTAQSLLIIDALNKNHFEP 
SKKI IADELIEWANATNAFKNNILGPSSKAALTAI IQGEDSQLYTKNALTNGSAMRIAPI 
55 GTLFSKDQKVELVNY VKEI SEATHTS DVAI AGASMI AYAVTLAAEDKNWKEI I EGVLDI H 
DIAIKEGEQTFSASIAERLKLAVQFANHFESEEEYVFFSH* 

Sequence 1823 

Cont ig_0 659_pos_5364_4 411, 
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is similar to (with p-value 4.0e-60) 

>sp:sp|P39668|YYXA_BACSU HYPOTHETICAL PROTEASE IN ROCR-PURA 
INTERGENIC REGION (EC 3.4.21.-). >gp : gp I D78193 i BACGNT2A_29 B 
acillus subtilis 36kb sequence between gntz and trnY genes e 
5 ncoding 34 ORFs. NID: gl064780. >gp : gp I Z99124 I BSUB0021_1 4 1 B 
acillus subtilis complete genome (section 21 of 21) : from 39 
99281 to 4214814. NID: g2636442. 

gtgataaacatgcaaaaatctacaaaccttgatgatttattcaacggtaaggcatctaaa 
tcaaaagaagcgggaattggttccggtgtgatttatcaaataagtgaaggttccgcatat 

10 atcgttacaaataatcacgttgttgatggtgcttcggaaattaaagttcaactacataat 
tcaaaacaagtagatgccaaattaataggtaaagacgccctaacagatattgctgttcta 
aaaataaaagatacaaaaggaataaaagcaattcaatttgctaattcgtcaaaagttcaa 
acaggagatagtgttt ttgcaatgggtaatcctctaggattagaatttgcaaattctgtt 
acatcaggaattatttcagctagcgaacgtacaattgacgccaatacttctgctggtaat 

15 actaaagttaatgttttacagacagacgctgcaataaatcccggtaattcgggtggtgca 
ttagtggatat taacggaaatctcgttggtatcaattccatgaaaattgcggcagcacaa 
gtagaaggtataggtt ttgctatacctagtaatgaagttagagtgaccatcgaacaactc 
gttaaacatggtaaaatcgaacgcccttcaatcggtataggtcttataaatttaagtgat 
attcctgaaaactatcgtaaagaactacatactcataaagacaaaggcgtttatgtagct 

20 aaagtagacagtgaaaatgccattaaaaagggtgatattattactggaatagatggtaaa 
caaataaaagatgatacagatttaagaacttatttatacgagagcaaaaaaccaggtgaa 
acggttactctaaaagttatcagagatggtaagacacaagacattaatgtaaaattaaaa 
aaacaagcatctgcatctgaa teat ctcaatcacaaagtcaattt get caataa 

25 Sequence 1824 

VINMQKSTNLDDLFNGKASKSKEAGIGSGVIYQISEGSAYIVTNNHVVDGASEIKVQLHN 
SKQVDAKLIGKDALTDIAVLKIKDTKGIKAIQFANSSKVQTGDSVFAMGNPLGLEFANSV 
TSGIISASERTIDANTSAGNTKVNVLQTDAAINPGNSGGALVDINGNLVGINSMKIAAAQ 
VEGIGFAIPSNEVRVTIEQLVKHGKIERPSIGIGLINLSDI PENYRKELHTHKDKGVYVA 

30 KVDSENAIKKGDIITGIDGKQIKDDTDLRTYLYESKKPGETVTLKVIRDGKTQDINVKLK 
KQASASESSQSQSQFAQ+ 

Sequence 1825 

Cont ig_0 659_pos_2 55 1_1 64 6 , 

35 is similar to (with p-value 3.0e-44) 

>gp:gpl AJ002293 I SPAJ2293_1 Streptococcus pneumoniae pbplb ge 
ne, partial, beta-lactam resistant. NID: g2982645. 
atgcatttcaacaagtatcaaatactaactacagataaatatactaaatttgaacattta 
tataagaaggtcaaacatatatgtgtcgtaatttttttggtggtttttttgattggtttt 

40 attatactgttgtcat tagtattatacttccaacaactaactaaagatgcttcgtcaa ta 
agtgatcgagagttgaaagcaaaaatccttcatatacctggcgatgagctaataaatcat 
aataatcaaattttagaagaatatgatcattcacaaaatacactcatagttggaccgaac 
catgtaaattcaaatattatacatgcacttacagcctctgaagatacattattttataaa 
cataacggtattatgcctaaagcacttttaagggcgatgcttcaagatatcacaaattca 

45 aaccaatcttctggtggtagtactatcacgcaacaattagtaaaaaatcaagtgctctca 
aataaaaaaacttatagtcgtaaagcaaatgaaattatcttggctacacgggtcgaaaat 
ttattatcaaaagatgaaatcatatatacgtatttaaatatcgtcccatttggtcatgac 
tacaacggtgccaatataactggtatatcgtctgcttcatatagtctgtttggtatacct 
gcaaaagatttgaatattgcacaatcagct tatct cat tggct tact gcaaagtccatac 

50 ggctatacgccttatgacgaacacggcaaagtaaagccttaccatcttttaaaattgagc 
atgaaacgtcaacaatacgtacttaaacgtatgcgtgttgaaggaaaaatttctaaacaa 
caatacgaaaacgctaaaaaatacaatat taaacagcact tgctgaaacaatcgaaagac 
gaataa 

55 Sequence 18 2 6 

MHFNKYQILTTDKYTKFEHLYKKVKHICVVIFLVVFLIGFIILLSLVLYFQQLTKDASSI 
SDRELKAKILHIPGDELINHNNQILEEYDHSQNTLI VGPNHVNSNIIHALTASEDTLFYK 
HNGIMPKALLRAMLQDITNSNQSSGGSTITQQLVKNQVLSNKKTYSRKANEI I LATRVEN 
LLSKDEII YTYLNIVPFGHDYNGANITGISSAS YSLFGIPAKDLNIAQSAYLIGLLQSPY 
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GYTPYDEHGKVKPYHLLKLSMKRQQYVLKRMRVEGKISKQQYENAKKYNIKQHLLKQSKD 
E* 

Sequence 1827 
5 Contig_0660_pos_7907_7290, 

is similar to {with p-value 3.0e-52) 

>sp:sp|P52996|PANB_BACSU 3-METHYL-2-OXOBUTANOATE HYDROXYMETH 
YLTRANS FERASE (EC 2.1.2.11) ( KETOPANTOATE HYDROXYMETHYLTRANS 
FERASE) . >gp:gp|L47709|BACYPIA__17 Bacillus subtilis (clone Y 

10 AC15-6B) ypiABF genes, qcrABC genes, ypjABCDEFGHI genes, bir 
A gene, panBCD genes, dinG gene, ypmB gene, aspB gene, asnS 
gene, dnaD gene, nth gene and ypoC gene, complete cds ' s . NID 
: gll46223. >gp : gp | Z99115 | BSUB0012183 Bacillus subtilis com 
plete" genome (section 12 of 21): from 2195541 to 2409220. NI 

15 D: g2634478. 

atgacagtgttaggatatgatagtactgttcaagttacattgaacgatatgattcatcat 
ggtaaggctgttaaaagaggtgcttcagatacatttatagttgttgatatgcctataggg 
actgttggtttaagtgatgaagaagatctaaaaaatgcacttaagctttatcaaaacacg 
aatgctaacgctgtcaaagtagaaggggctcatcttacatcatttattcaaaaagcaact 

20 aaaatgggtatacctgttgtttctcacttaggtcttacacctcaaagtgtaggtgtaatg 
gggtataaacttcaaggggatacaaagacagccgctatgcaacttatcaaagatgctaaa 
gctatggaaactgctggtgcagtagtactggttttagaagccatacctagtgatttagct 
cgagaaattagtcagcaactcactattccagttataggtataggggcaggaaaagatact 
gatgggcaagtgttagtgtatcatgatatgttaaattatggtgttgatcgacacgctaag 

25 tttgttaagcaatttgcagacttttcaagtggtattgatggattaaggcaactgaaaata 
acaaataaatacctatga 

Sequence 1828 

MTVLGYDSTVQVTLNDMIHHGKAVKRGASDTFI VVDMPIGTVGLSDEEDLKNALKLYQNT 
30 NANAVKVEGAHLTSFIQKATKMGIPVVSHLGLTPQSVGVMGYKLQGDTKTAAMQLIKDAK 
AMETAGAVVLVLEAIPSDLAREISQQLTIPVIGIGAGKDTDGQVLVYHDMLNYGVDRHAK 
FVKQFADFS SGI DGLRQLKI TNKYL * 

Sequence 1829 
35 Contig_0660_pos_7093_6287, 

putative peptide of unknown function 

atggatccgagtttgattttaccttatttatgggtacttttagtacttgtatttttagaa 
ggattattagcagctgataatgcaattgtaatggcggtaatggttaaacatctaccacct 
aaacaacgtaaaaaagcacttttttatggcctattaggtgcattcatttttagatttatt 

40 gctttatttttaataagtattattgcaaacttctggtggattcaagcagcaggtgctgt t 
tacttaatctatatgtctattaaaaatttatggcaatttttccatcagtcaaatgaaaaa 
caccataaagaaacaggagacgaacatcatttcgatgaaacaggcaacgaaaaagaagta 
ggccctaaatctttttggggaacagtatttaaagttgaattcgctgatatcgcgtttgca 
attgattcgatgcttgccgcattagccatagccgttacattaccaaaagttggcataca t 

45 tttggtggtatggacttaggccaatttattgttatgttccttggtggaatgataggtgtc 
atcttgatgagatttgcagcaacttggtttgtagaattgttgaataaatatccaggactt 
gaaggtgctgcgtttgcaattgtaggttgggtaggtattaaacttgttataatggtactt 
gcacatcctgatattggcgttttaccagaagcatttccacatagtgctttatggcaaaca 
atcttctgggtagtattagttggcttagttttaataggatggttaacttcagcaattggc 

50 aacaagaaaaaaggtaatcaaaaataa 

Sequence 1830 

MDPSLILPYLWVLLVLVFLEGLLAADNAIVMAVMVKHLPPKQRKKALFYGLLGAFI FRFI 
ALFLISIIANFWWIQAAGAVYLI YMSIKNLWQFFHQSNEKHHKETGDEHHFDETGNEKEV 
55 GPKSFWGTVFKVEFADIAFAIDSMLAALAIAVTLPKVGIHFGGMDLGQFI VMFLGGb^IGV 
ILMRFAATWFVELLNKYPGLEGAAFAIVGWVGIKLVIMVLAHPDIGVLPEAFPHSALWQT 
IFWVVLVGLVLIGWLTSAIGNKKKGNQK* 

Sequence 1831 
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Contig_0660_pos_5800_4 319, 

is similar to (with p-value 3.0e-49) 

>gp: gp I AF000658 I SPDNAARG_2 Streptococcus pneumoniae R801 tRN 
A-Arg gene, partial sequence, and putative serine protease ( 
5 sphtra), SPSpoJ (spspoJ), initiator protein (spdnaa) and bet 
a subunit of DNA polymerase III (spdnan) genes, complete cds 
. NID: g2109442. 

atgaataaatctaaagacgacgataataaaattggagaagaaagtctacatgatgtgc'ga 
gtttcaagtgatacctccactttaccgcatcaaaataagtcaataaaagactatgatgat 

10 tctggaaacgaaagtaaacaacatactaaattgacttcaaaggaatctatgctaggcgta 
aattctaatcatactgagcaagattcaagaagtacacaaccatattcttcaaaacatagc 
tattcacaaccaaaagataaagacaacgataatactcaacaagcgcaatttcttaaaaaa 
gaagacaaacaacgtaacagagccgaaaatataaaaaaagttaatgaatttaaacaattg 
gtggtagctttctttaaagaacactggcctaaaatgttaattattatcggtattatagta 

15 t tact tttaatattaaatgcca tat tcactacagttaataaaaatgat cat acaaatgat 
agtgcatttaacggtacagctaaagatgaaacaacagcgatgaaaattgctgaaaactct 
gttaagtcagttgtaactgtcgagaatgatttgtctaatgacacgactgtgtctgataac 
aaaaatgaatctgataatgagataggatcaggtgtcgtctacaaaaaagtgggcgactct 
atttatatttttactaatgcacacgttgtcggtgatcaagaaaaacaaaaagtaacatat 

20 ggtaatgataaatctgtgacaggtaaggtaattggtaaagataaatggtctgatttagca 
gttgtaaaagctaaagttgctgacgaaaatattaaaccaatgactatgggggattctaat 
aatattaaattggctgaacctattttagttataggcaatcctctaggcacagactttaaa 
ggaagtgtttctcaaggtattgtgtctggactcaatcgtcatgtacctgtggacattgat 
aaaaatgataattatgatgctttgatgaaagcttttcaaattgatgcgcctgtgaaccca 

25 ggaaattcaggtggtgctgtggtcgatagagacggtagactcataggtatagtttcttta 
aaaattgatatgcataatgtagaaggaatggcttttgcgatacctattaatgatgtacgt 
aagattgcgaaagaattagagcataaaggtaaagtgaactaccccaatactgaaatcaaa 
atcaaaaatgttggtgaccttgacgattctgaacgtaatgcaatcaacttaccagctaaa 
gtgaatcatggtgtattaatcggtgaagtgaaagaaaatggtttaggagacaaagcaggt 

30 ttaaaaaaaggtgatgtaatagtagaattagatggtaagaagattgaagataacttacga 
tatagacaagtcatatatagtcattatgatgatcaaaaaacaattactgctaaaatttat 
cgaaatggtgcggagaaaaatattaaaatcaaattgaaataa 

Sequence 1832 

35 MNKSKDDDNKIGEESLHDVRVSSDTSTLPHQNKSIKDYDDSGNESKQHTKLTSKESMLGV 
NSNHTEQDSRSTQPYSSKHSYSQPKDKDNDNTQQAQFLKKEDKQRNRAENIKKVNEFKQL 
VVAFFKEHWPKMLI I IGI I VLLLILNAI FTTVNKNDHTNDSAFNGTAKDETTAMKI AENS 
VKSVVTVENDLSNDTTVSDNKNESDNEIGSGVVYKKVGDSIYI FTNAHVVGDQEKQKVTY 
GNDKSVTGKVIGKDKWSDLAVVKAKVADENIKPMTMGDSNNIKLAEPILVIGNPLGTDFK 

40 GSVSQGIVSGLNRHVPVDIDKNDNYDALMKAFQIDAPVNPGNSGGAVVDRDGRLIGIVSL 
KIDMHNVEGMAFAIPINDVRKIAKELEHKGKVNYPNTEIKIKNVGDLDDSERNAINLPAK 
VNHGVLIGEVKENGLGDKAGLKKGDVIVELDGKKIEDNLRYRQVIYSHYDDQKTITAKIY 
RNGAEKNIKIKLK* 

45 Sequence 1833 

Contig_0660_pos_4 301_294 3, 

is similar to (with p-value 2.0e-42) 

>sp:sp|P43440|NTPJ_ENTHR V-TYPE SODIUM ATP SYNTHASE SUBUNIT 
J (EC 3.6.1.34) (NA(+)- TRANSLOCATING ATPASE SUBUNIT J). >gp 
50 :gp| D174 62 I ENENTP_11 Enterococcus hirae ntp genes for Na+ -A 
TPase subunits, complete cds. NID: g487271. 

atgtcagttttaagtcagctgcttaaaaaatctagccctcaacaagggattatactctat 
tatctatttgccatcgtcgttgcatttttattattaaatttgccatatgttcataaacaa 
ggggttgaggttaatccaattgatacactttttgtagctgtatcaggtattagtgttaca 
55 ggattatcaccaattactatagtcgacacatactcaacctttggtcaaattatcatactt 
attattttgaacatcggtggtataggtgttatggctattggaaccatgctttgggttgtt 
ttaggcaaacatattggaattagagaacgacaactcattatgttagataataatagagat 
actatgagtggaacggttaaactgattttagaaattgtaagaacaatttttattattgag 
tttataggtgcactcttactcgcgttctatttttatcgggataatcctgatttaaaaaac 
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gcattaatgcaaggcatttttgtgtcagtatctgctacgactaacggtggtcttgatatt 
actggtgaatctttagttccctatgcaaaagactattttgtacaaacaatagtaatgttc 
ttaattgtgttaggatctataggctttccagttttgttggagttaaaagcatatataaag 
aatcgagtaactaattttaggttctcattattcacaaaaataacgacaacaacgtattta 
5 ttcttatttttattcggcgttattgttgtattaatattagaacatagcaatgcatttaaa 
ggattgagttggcatcaatcattattttatgcattgttccaatcatcaacgaccagaagt 
gcaggtttgcaaacgatagatgtgtcacatttcagcgacgcaacaaatattgtaatggga 
ttattgatgtttataggatcatctcccagttctgtaggaggtggaatcagaacaacaaca 
tttgctattttaattttgtttgttattaattttaataatactggtgacaaaacaggtatt 

10 aaaattttcaacagagaagtacatattatggatgtacaaagatcatttgccgtatttact 
atggcgtcattaattacatttattagcatgattattatatctgccactgaacaaggcaag 
ttgtcctttttacaaattttctttgaagtaatgtctgcgtttggtacatgtggtttaagc 
ttaggtgtgacaagcgatgtcaacgacattaccaaggctgtattaatgatattaatgttt 
ataggtcgtgtaggacttatttcattcattattatgttagctggacgtaaagaacccgaa 

15 aaatatagctatcctaaagaacgtatacaaattggatag 

Sequence 1834 

MSVLSQLLKKSS PQQG 1 1 LY YLFAI VVAFLLLNLPYVHKQGVEVNPI DTLFVAVSGI SVT 
GLSPITIVDTYSTFGQIIILIILNIGGIGVMAIGTMLWVVLGKHIGIRERQLIMLDNNRD 

20 TMSGTVKLILEIVRTIFIIEFIGALLLAFYFYRDNPDLKNALMQGIFVSVSATTNGGLDI 
TGESLVPYAKDY FVQTIVMFLI VLGSIGFPVLLELKAYIKNRVTNFRFSLFTKITTTTYL 
FLFLFGVIVVLILEHSNAFKGLSWHQSLFYALFQSSTTRSAGLQTIDVSHFSDATNIVMG 
LLMFIGSSPSSVGGGIRTTTFAILILFVINFNNTGDKTGIKI FNREVHIMDVQRSFAVFT 
MASLITFISMII ISATEQGKLSFLQIFFEVMSAFGTCGLSLGVTSDVNDITKAVLMILMF 

25 IGRVGLISFIIMLAGRKEPEKYSYPKERIQIG* 

Sequence 1835 

Cont i g_0 6 60_pos_2 8 4 9_1 5 0 9 , 

putative peptide of unknown function 

30 atgaatggtacatgttacactaaattcaaggagaggagtggctctatggaaaaaaatgaa 
aataaaatagttgatgtgattgcaacttctgatatgcatagtcacttcttaaatggtgat 
tatggctcaaacatttatcgggctggtacttatgttaatgaagctcgaaaaaataatgaa 
aatgtcatattactagatagtgggggaagtttggcgggatcacttgcagcattttattac 
gccgttgtagcaccatataaacgtcatcctatgattaagttgatgaatgcaatgcagtat 

35 gatgctagtggaattagccctaatgaatttaaatttggattatcttttttgacacgttca 
gttgcactctcaaggtttccttggctatcagcaaatatagagtatactgttactagagag 
ccatatttttctacgccttatacaatcaaaatgtattcagatttgaaaattgctatcgta 
ggtttaacatcagatggattaatgaagaacgagtacgcagaaatggaagaagatgtctgt 
attgagaagactttggtttcagctaaacgttggattagatacatacatgaagttgaagaa 

40 ccagacttccttattgttatttatcatggtgggttaaataaaattagtagtgccaataaa 
agaaatgaaaaaaatgcaaacgaagctgaaaaaattatggaagaacttggtgttattgat 
gtaattattaccgctcatcaac'atcaaacagtagttggaaaagatcatggaactatatat 
gttcaagcaggtcaaaatgctgaggaattagtacatctttcaattaaatttaagaaacgt 
acaacttcttatgagattgagcacatcgactcaaaagttattgacttaaatgattaccat 

45 gaagatgagcaattattaaaagaaacatattatgatcgtaaggcagtcaaacactgggca 
aattcagtagtttcaaacaaaaacaatggcttaacagttcaatgtattgaagatattatt 
tgtaagccgcatccttttactcaattattacatgatgcaattagat tagcctataattat 
gatatttcttgtgtgcatatacctaagaatggtgaggaagggttaaaaggaactataaga 
aatagagatatatacgatgcataccctcatccagataaacctatagatatcactgtcaaa 

50 ggtaaaaatatcaaagatatacttgaatacagttatgcgcatattgattttaataagcga 
caatttattagaacggaggttaaattacaatttatttgtttgattaaatttaactattat 
tcatttggacatctattttaa 

Sequence 1836 

55 MNGTCYTKFKERSGSMEKNENKIVDVIATSDMHSHFLNGDYGSNIYRAGTYVNEARKNNE 
NVILLDSGGSLAGSLAAFYYAVVAPYKRHPMIKLMNAMQYDASGISPNEFKFGLSFLTRS 
VALSRFPWLSANIEYTVTREPYFSTPYTIKMYSDLKIAIVGLTSDG LMKNEYAEMEEDVC 
IEKTLVSAKRWIRYIHEVEEPDFLIVI YHGGLNKISSANKRNEKNANEAEKIMEELGVI D 
VI ITAHQHQTVVGKDHGTI YVQAGQNAEELVHLSIKFKKRTTSYEIEHIDSKVIDLNDYH 
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EDEQLLKETYYDRKAVKHWANSVVSNKNNGLTVQCIEDIICKPHPFTQLLHDAIRLAYNY 
DISCVHIPKNGEEGLKGTIRNRDI YDAYPHPDKPIDITVKGKNIKDILEYSYAHIDFNKR 
QFIRTEVKLQFICLIKFNYYSFGHLF* 

5 Sequence 1837 

Contig_0660_pos_14 63_1002, 

is similar to (with p-value 2.0e-48) 

>gp: gpl Y17554 | BLY17554_2 Bacillus lichenif ormis arcA, arcB, 
arcC and arcD genes. NID: g3687415. 

10 atgaatttccatcttgtttgtcctaaagaactcaatccgacagaagaattattaaatcgt 
tgcgaacgtattgcgacggaaaatggcggtaacattttaataacagatgatattgataaa 
ggcgtgaaagattctgatgttatttatacagatgtttgggtatcaatgggcgaacctgat 
gaagtatggcaagaacgccttaaacttttaaaaccatatcaagttaaccaagcattatta 
gaaaaaacaggcaatccaaatgttatttttgaacattgtttaccttctttccacaatgca 

15 gaaactaaaattggtcaacaaatttatgaaaaatatggcattagtgaaatggaagtcact 
gatgatgtcttcgaaagcaaagcttctgtagtattccaagaagctgagaatagaatgcat 
acaattaaagcggtcatggtagcaactttaggagaattctaa 

Sequence 1838 

20 MNFHLVCPKELNPTEELLNRCERIATENGGNILITDDIDKGVKDSDVI YTDVWVSMGEPD 
EVWQERLKLLKPYQVNQALLEKTGNPNVI FEHCLPSFHNAETKIGQQI YEKYGISEMEVT 
DDVFESKASVVFQEAENRMHTIKAVMVATLGEF* 

Sequence 1839 
25 Cont i g_0 6 6 l_pos_l 1 2 2_2 2 4 9, 

is similar to (with p-value 0.0e+00) 

>gp:gp|AF009352|AF009352_2 Bacillus subtilis osmoprotectant 
transport system OpuC including ATPase (opuCA) r transmembran 
e protein (opuCB) , osmoprotectant binding protein precursor 
30 (opuCC) and transmembrane protein (opuCD) genes, complete cd 
s. NID: g2271388. >gp: gp | Z99121 | BSUB0018_69 Bacillus subtili 
s complete genome (section 18 of 21): from 3399551 to 360906 
0. NID: g2635827. 

atgaccatagacattgaatcaggagact ttattgcatttattgggacaagtggtagcggt 

35 aaaacaactgcccttagaatgat taatcgtatgattgaatctacagagggagaaattacc 
attgacggtaaaaatatcaaagagcttaatccggttgagcttcgtcgcagtatcggttat 
gtcatacaacaaatcggcttaatgccacatatgacagtgaaagagaatattgttctcgta 
ccaaagttattaaaatggtcacaagagaaaaaggatgagaaagcgaaagaacttatacgc 
ttagtagatttaccagaagaatatttagatcgatatccttcagaattatcaggtggtcaa 

40 caacaacgtattggtgttgtaagagcactcgcagctgaacaagatattattttaatggat 
gaaccgtttggtgcactcgatccaatcacaagagatacattacaagacttagtcaaaaaa 
ttacaacaacaattaggaaagacattcatttttgttacacatgatatggatgaagcaatc 
aaacttgcagataaaatatgtattatgacaaatggacaggtgattcaatatgacacgcca 
gataatattttacgtagtccagcgaatgatttcgttagagactttattggtcagaatcgc 

45 ttaattcaagatagacctaatatccgtacagttaaagatgcgatgattaaacccgtgaca 
gtacatgttgaccgttctcttaatgatgcggtgaatattatgagagagaaaagagtcgat 
acgatatttgttgtcggcaatgatgagcatttattgggttatttagatattgaagatatt 
aacgaaggattaagacatcataaagaacttatagatacgatgcaacgagatatttataga 
gtacgtattgatagtaagttacaagattctgttcgtacaattcttaaacgtaatgtacgt 

50 aatgtacccgttgttgacagtgataataaaacattattaggccttgtcacccgagctaac 
cttgtagacattgtttatgacagtatt tggggagagttagaatcgggtaacaatgataat 
cattctgggattgttgaacccgagtccacaggagttgagacaccatga 

Sequence 1840 

55 MTIDIESGDFIAFIGTSGSGKTTALRMINRMIESTEGEITIDGKNIKELNPVELRRSIGY 
VIQQIGLMPHMTVKENIVLVPKLLKWSQEKKDEKAKELIRLVDLPEEYLDRYPSELSGGQ 
QQRI G WRALAAEQDI I LMDE PFGALD PI TRDTLQDLVKKLQQQLGKTFI FVTH DMDEAI 
KLADKICIMTNGQVIQYDTPDNILRSPANDFVRDFIGQNRLIQDRPNIRTVKDAMIKPVT 
VHVDRSLNDAVNIMREKRVDTI FWGNDEHLLGYLDIEDINEGLRHHKELIDTMQRDIYR 
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VRIDSKLQDSVRTILKRNVRNVPVVDSDNKTLLGLVTRANLVDTVYDSIWGELESGNNDN 
HSGIVEPESTGVETP* 

Sequence 1841 
5 Contig_0661_pos_2336_2881, 

is similar to (with p-value 2.0e-44) 

>gp:gp| AF009352 |AF009352_3 Bacillus subtilis osmoprotectant 
transport system OpuC including ATPase (opuCA), transmembran 
e protein (opuCB) , osmoprotectant binding protein precursor 
10 (opuCC) and transmembrane protein (opuCD) genes, complete cd 
s. NID : g2271388. >gp : gp I Z99121 I BSUB0018_68 Bacillus subtili 
s complete genome {section 18 of 21): from 3399551 to 360906 
0. NID: g2635827. 

atgattgtagcggttcctttaggcattttattatctaaaaaggaaaagctttcgaaagtg 
15 tcattgacgatagctggcgttttacaaacgatacctacattagcggtattagctttaatg 
attccattatttggtgtaggaaaaacacctgcaattatagcgttatttttatatgtatta 
ttaccaattttaaataatacgattataggcattcaaaatatagattccaaccttagagaa 
gcaggacgtagtatgggaatgactaactttcaattgatgaaagatgtggagttgccactc 
gcattaccattaatacttagtggaattaggctgtcttctgtctacgttattagttgggca 
20 acattggcaagttatgttggtgctggtggtttaggtgattttatctttaacggattagcg 
ctatttgaaccgagtatgattattactgcaactattcttgtcactgctattgcacttatt 
gtagattatgttttatcactgattgaaaaatgggttgtacctaaaggattaaaagtttcc 
agataa 

25 Sequence 1842 

MIVAVPLGILLSKKEKLSKVSLTIAGVLQTIPTLAVLALMIPLFGVGKTPAIIALFLYVL 
LPILNNTI IGIQNIDSNLREAGRSMGMTNFQLMKDVELPLALPLILSGIRLSSVYVISWA 
TLASYVGAGGLGDFIFNGLALFEPSMIITATILVTAIALIVDYVLSLIEKWVVPKGLKVS 
R* 

30 

Sequence 1843 

Contig_0661_pos_2900_3859, 

is similar to (with p-value 3.0e-93) 

>gp:gp|AF009352|AF009352_4 Bacillus subtilis osmoprotectant 
35 transport system OpuC including ATPase (opuCA), transmembran 
e protein (opuCB) , osmoprotectant binding protein precursor 
(opuCC) and transmembrane protein (opuCD) genes, complete cd 
s. NID: g2271388. 

atgaagaattataaaaaatatctaattgttttagtattatgtctaacagtgttatctgga 
40 tgtaacttacccggtttaaaaaatagtcattcagatgacgatgttagaatcacaagttta 
ggaacaagtgaatcacaaattatatcgcacatgctgagattacttattgaacatgataca 
aaaggtgaaattaaacctaccttaattaataatttaggttcaagtacgattcaacataat 
gctgtcacaagtggccaagctaatatgtcaggtacgcgttatacaggcactgacttaaca 
ggggcgttgaaagaagatccgattaaagatcctaaaaaggccatgaaagcgactcaagaa 
45 gggtttaaaaagaaatacaatcaaaccttcttcaattcatatggttttgaaaacacatat 
gcattgatagtaacaaaagagacagctaagaaatatcatttagaaactgtttcagactta 
gaaaagcatgctaaagatttaagagtaggtatggatagttcatggatggaccgtaaaggt 
gacggt tatccagcatttaaaaaagaatatggctatagtttcggtacggtaagacctatg 
caaattggtcttgtctatgatgcactgagttctggcaaattagacgt tgccgtaggttat 
50 tctacagatggacgtatttcagcatatgatttaaaagtgttgaaagatgatcgtcgattc 
ttcccaccatacgatgccagtccacttgcatcagatcaattgttaaaagagaagccagag 
ctcaaaccgattgttaaaaaattagaaggtaagatatcaacagaacaaatgcaagaatta 
aattatcaagctgatggtaaagggaaagagcctgcaacagttgcagaggacttcttgaaa 
aaacatcattattttgaagacgatgacaataaaaaagataaacagaaaggtggtcaataa 

55 

Sequence 1844 

MKNYKKYLIVLVLCLTVLSGCNLPGLKNSHSDDDVRITSLGTSESQI ISHMLRLLIEHDT 
KGEIKPTLINNLGSSTIQHNAVTSGQANMSGTRYTGTDLTGALKEDPIKDPKKAMKATQE 
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GFKKKYNQTFFNSYGFENTYALIVTKETAKKYHLETVSDLEKHAKDLRVGMDSSWMDRKG 
DGYPAFKKEYGYSFGTVRPMQIGLVYDALSSGKLDVAVGYSTDGRISAYDLKVLKDDRRF 
FPPYDASPLASDQLLKEKPELKPIVKKLEGKISTEQMQELNYQADGKGKEPATVAEDFLK 
KHHYFEDDDNKKDKQKGGQ* 

5 

Sequence 1845 

Contig_0661_pos_3862_4 509, 

is similar to (with p-value 3.0e-47) 

>gp:gp| AF009352 j AF009352_5 Bacillus subtilis osmoprotectant 

10 transport system OpuC including ATPase (opuCA) , transmembran 
e protein (opuCB) , osmoprotectant binding protein precursor 
(opuCC) and transmembrane protein (opuCD) genes, complete cd 
s. NID: g2271388. >gp : gp | Z9912 1 | BSUB0018_66 Bacillus subtili 
s complete genome (section 18 of 21): from 3399551 to 360906 

15 0. NID: g2635827. 

atggaagggaatttaatacaacaactcgttcactactatcaaatgaactttggctaccta 
tgggaattgtttgttaatcatttgttaatgtctgtttacggtgttttactagcatgttta 
gtgggcatacctcttggtatcatcattgcacgatttggtaaattatcaggtgttattatc 
actattgcgaatattattcaaactgttccggttattgctatgttagcgatattaatgctt 

20 agtatgggacttggtatgaatacagtcatttttactgtgttcttatatgccttgcttcct 
attattaaaaatacgtatacaggaattaatgaagttgacccaaatattaaagatgctgga 
aaagggatgggtatgacgcgtaaccaagtattaactatgattgagttacctctgtcactt 
tcagttattataggtggtattcggattgcccttgttgtagctattggtgtcgtagctgta 
ggttcatttattggtgcaccaacattaggtgatattgtgattagaggtacaaatgcaact 

25 gatggaacactatttattctagcaggtgccatacctatcgttatcattgtcatacttata 
gatgttttattacgtttattagagaaaaagctagatccagctacgtaa 

Sequence 1846 

MEGNLIQQLVHYYQMNFGYLWELFVNHLLMSVYGVLLACLVGI PLGII I ARFGKLSGVI I 
30 TIANI IQTVPVIAMLAILMLSMGLGMNTVI FTVFLYALLPI I KNTYTGINEVDPNI KDAG 
KGMGMTRNQVLTMI ELPLSLS VI IGG I RI ALVVAIG VVAVGSFIGAPTLGDI VI RGTNAT 
DGTLFILAGAIPIVIIVILIDVLLRLLEKKLDPAT* 

Sequence 1847 
35 Contig_0661_pos_5935_5594, 

is similar to (with p-value 8.0e-28) 

>sp:sp|P44023|YFCC_HAEIN HYPOTHETICAL PROTEIN HI0594 . >pir:p 
ir IE64010 IE64010 hypothetical protein HI0594 - Haemophilus i 
nfluenzae (strain Rd KW20) >gp : gp | U32741 | U3274 1_2 Haemophilu 
40 s influenzae Rd section 56 of 163 of the complete genome. NI 
D: gl573582. 

gtgcaacatatgagtgggcctttatttatcattgttctgctctttatctttttctgttta 
ggatttatcgtgccgtcctcatcaggattagcagtactatctatgcctatctttgcgcca 
ttagctgatacagtaggtataccaagatttgttattgttacaacatatcaattcggtcag 
45 tatgcaatgttgttcttagcgcctactggacttgtaatggcaacacttcaaatgttaaac 
atgcgctactcacactggttacgtttcgtatggcctgttgtcgcgtttgttttaatattt 
ggtggaggcttacttattacacaagttttaatatactcataa 

Sequence 1848 

50 VQHMSGPLFIIVLLFIFFCLGFIVPSSSGLAVLSMPIFAPLADTVGIPRFVI VTTYQFGQ 
YAMLFLAPTGLVMATLQMLNMRYSHWLRFVWPWAFVLI FGGGLLITQVLI YS * 

Sequence 1849 

Contig_0661_pos_4 4 60_4 065, 
55 is similar to (with p-value 2.0e-30) 

>gp:gp|AF009352|AF009352_5 Bacillus subtilis osmoprotectant 
transport system OpuC including ATPase (opuCA) , transmembran 
e protein (opuCB), osmoprotectant binding protein precursor 
(opuCC) and transmembrane protein (opuCD) genes, complete cd 
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s. NID: g2271388. >gp : gp | Z99121 | BSUB0018_66 Bacillus subtili 
s complete genome (section 18 of 21): from 3399551 to 360906 
0. NID: g2635827. 

atgacaatgataacgataggtatggcacctgctagaataaatagtgttccatcagttgca 
5 tttgtacctctaatcacaatatcacctaatgttggtgcaccaataaatgaacctacagct 
acgacaccaatagctacaacaagggcaatccgaataccacctataataactgaaagtgac 
agaggtaactcaatcatagttaatacttggttacgcgtcatacccatcccttttccagca 
tctttaatatttgggtcaacttcattaattcctgtatacgtatttttaataataggaagc 
aaggcatataagaacacagtaaaaatgactgtattcataccaagtcccatactaagcatt 
10 aatatcgctaacatagcaataaccggaacagtttga 

Sequence 1850 

MTMITIGMAPARINSVPSVAFVPLITISPNVGAPINEPTATTPIATTRAIRIPPIITESD 
RGNSIIVNTWLRVIPIPFPASLIFGSTSLIPVYVFLIIGSKAYKNTVKMTVFIPSPILSI 
15 NIANIAITGTV* 

Sequence 1851 

Cont i g_0 6 6 l_pos_2 5 5 3__2 14 3, 

putative peptide of unknown function 

20 atgcctataatcgtattatttaaaattggtaataatacatataaaaataacgctataatt 
gcaggtgtttttcctacaccaaataatggaatcattaaagctaataccgctaatgtaggt 
atcgtttgtaaaacgccagctatcgtcaatgacactttcgaaagcttttcctttttagat 
aataaaatgcctaaaggaaccgctacaatcattgcaataaccaatgccactatcgatatg 
tacaaatgttcgaatgtctttgcaaataacataccactatgttccgaaataaactttatc 

25 atggtgtctcaactcctgtggactcgggttcaacaatcccagaatgattatcattgttac 
ccgattctaactctccccaaatactgtcataaacaatgtctacaaggt tag 

Sequence 1852 

MPIIVLFKIGNNTYKNNAI IAGVFPTPNNGI IKANTANVGI VCKTPAI VNDTFESFS FLD 
30 NKMPKGTATI IAITNATI DMYKCSNVFANNIPLCSEINFIMVSQLLWTRVQQSQNDYHCY 
PILTLPKYCHKQCLQG* 

Sequence 1853 

Con t i g_0 664 _po s_5 8 4 9_5 1 6 0 , 

35 is similar to {with p-value 1.0e-55) 

>sp:sp|P52998|PANC_BACSU PANTOATE-- BETA- ALANINE LIGASE (EC 6 
.3.2.1) { PANTOTHENATE SYNTHETASE) (PANTO ATE ACTIVATING ENZYM 
E) . >gp:gp|L47709|BACYPIA_18 Bacillus subtilis (clone YAC15- 
6B) ypiABF genes, qcrABC genes, ypj ABCDEFGHI genes, birA gen 

40 e, panBCD genes, dinG gene, ypmB gene, aspB gene, asnS gene, 
dnaD gene, nth gene and ypoC gene, complete cds 1 s . NID: gll 
46223. >gp:gp|Z99115|BSUB0012_182 Bacillus subtilis complete 
genome (section 12 of 21): from 2195541 to 2409220. NID: g2 
634478. 

45 gtgaatcccttgcagtt tggacctaatgaggattttgatgcttatccacgtcaactcgat 
gatgatgtggctgcagtaaaaaagttacaagtggattatgttttccatccgagtgtagat 
gaaatgtatccagaagaattaggtattcatctgaaagttggacacttggcacaagtatta 
gagggagcacaaagacctggacacttcgaaggtgttgtgaccgtggtcaacaaactattt 
aatattgtgcaaccagattatgcctattttgggaaaaaggatgcacaacaattagctatt 

50 gttgaaaagatggttaaagactttaatcttcctgtacatgttatcggtattgatatcgta 
agagaaaaagatggtttagccaaaagctctagaaatatttacttgacctctgaagaacga 
aaagaggcaaaacatttatatcaaagtctacgcttagcaaagaatttgtatgaagcgggt 
gaacgagatagcaatgagattataggtcaaatcgctgcgtatttaaacaaaaatattagt 
ggacatattgatgatttgggtatttatagttatccaaatcttatacaacaatcaaagatt 

55 catggacgaatattcatatcattggcagttaaattttctaaagcaagattgatagataat 
ataattattggagatgactatattgattag 

Sequence 1854 

VNPLQFGPNEDFDAYPRQLDDDVAAVKKLQVDYVFHPSVDEMYPEELGIHLKVGHLAQVL 
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EGAQRPGHFEGWTVVNKLFNIVQPDYAYFGKKDAQQLAIVEKMVKDFNLPVHVIGIDIV 
REKDGLAKSSRNI YLTSEERKEAKHLYQSLRLAKNLYEAGERDSNEIIGQIAAYLNKNIS 
GHIDDLGI YS YPNLIQQSKIHGRIFISLAVKFSKARLIDNIIIGDDYID* 

5 Sequence 1855 

Contig_0664_pos_5152_4 781, 

is similar to (with p-value 4.0e-35) 

>sp: spl P52999I PAND_BACSU ASPARTATE 1 -DECARBOXYLASE PRECURSOR 
(EC 4.1.1-11) (ASPARTATE ALPHA- DECARBOXYLASE). >gp:gp|L477 

10 09|BACYPIA_19 Bacillus subtilis (clone YAC15-6B) ypiABF gene 
s, qcrABC genes, yp j ABCDEFGHI genes, birA gene, panBCD genes 
, dinG gene, ypmB gene, aspB gene, asnS gene, dnaD gene, nth 
gene and ypoC gene, complete cds • s . NID: g!14 6223. >gp:gp|Z 
99115 | BSUB0012_181 Bacillus subtilis complete genome (sectio 

15 n 12 of 21): from 2195541 to 2409220. NID: g2634478. 

atgaactcaaaaatccatagagctagagttacagaatctaatttaaattacgttggaagc 
ataacaatagatgccaatatattagatgcggttgatattttacccaatgaaaaggttgct 
attgttaataataataatggtgctcgatttgaaacatatgtcattgctggagaacgtggt 
agtggaaagatgtgtttaaatggcgcggcttcaagactagttgaagttggagacgtcatt 

20 attattatgacatatgcacaattaaatgaagatgaaatggtagatcactcaccaaaagta 
gctgtgttaaatgaaaataatgaaattatagaaatgataaatgagaaagaaaatacgata 
tcaaatgtataa 

Sequence 1856 

25 MNSKIHRARVTESNLNYVGSITI DANILDAVDILPNEKVAIVNNNNGARFETYVIAGERG 
SGKMCLNGAASRLVEVGDVIIIMTYAQLNEDEMVDHSPKVAVLNENNEIIEMINEKENTI 
SNV* 

Sequence 1857 
30 Contig_0664_pos_4 363_3353, 

putative peptide of unknown function 

atgcattgtccgaattgcggtaatccaatagaagatgatgatttattctgtggtgaatg t 
ggacataaaataagtcgacatccacagtcagtcagaaatgcagaaagtgaaataacaaaa 
gctgaaaagaatgatgaagaacaaaacataacatcaaataataaagaaaacaacgcagcg 

35 actcatcaaaatgttgattcaacatctcatgatgagactagatcaaatgaaaatgacgta 
gcagattcaacattacaatctaagcagtcacataatgatacccaacaatcaaatctctct 
acataccatcaaagacctcaacatcgagaaattcctcaaaatcaacacaatcacgatcaa 
cagcaaagtcaaataggtcaacaagctaaacaagtaacaaatgaaagtaaaggtttcttt 
aaaagtgcatttactgcacctgataaaatcattcaaactaatcatgttttcagttttaaa 

40 ttattattatcattattaataatcggttttattgtattagcaattttactcgcttccgta 
ataccagttgagattggtattttcggtactacaagaggaagtttggtaacgagtatcatt 
tttggtattattctatttttggttgtcatagtaggtgcaatatttgggcttacacgttta 
gtagttagacaacctattgcatttaaaaaagtattatcagactatgtgttaattaatagt 
gtttcgttagcaattttaattatttctgtaattttaatattagcagaatcatacagcttt 

45 ggcggaagtatagcattattgtctttattattatttattgcttctggtatttatctaatt 
gcgaagtatagcactggtaatcaaacaagaatatccagcttttatggtgtgattatttat 
atcattattttgttcttatttattcgtatttttggggaggcatttttccatcaaatattt 
ggtgattttatagaagaactaggggatttatttgaaggaggaacttattaa 

50 Sequence 1858 

MHCPNCGNPIEDDDLFCGECGHKISRHPQSVRNAESEITKAEKNDEEQNITSNNKENNAA 
THQNVDSTSHDETRSNENDVADSTLQS KQSHNDTQQSNLSTYHQRPQHREI PQNQHNHDQ 
QQSQIGQQAKQVTNESKGFFKSAFTAPDKIIQTNHVFSFKLLLSLLIIGFIVLAILLASV 
IPVEIGI FGTTRGSLVTSIIFGIILFLVVIVGAIFGLTRLVVRQPIAFKKVLSDYVLINS 

55 VSLAILIISVILILAESYSFGGSIALLSLLLFIASGI YLIAKYSTGNQTRISSFYGVII Y 
1 1 ILFLFI RI FGEAFFHQI FGDFIEELGDLFEGGTY * 

Sequence 1859 

Con t i g_0 6 6 4_pos_3 3 4 9_2 018, 
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is similar to (with p-value 6.0e-22) 

>gp:gp| AF051917 |AF051917__2 Staphylococcus aureus plasmid pSK 
41, complete sequence. NID: g3676412. 

atgaaatattgcagtaattgtggtcaacctcttcgagaaggtgtaaaggtgtgtacaaat 
5 tgtggtacgcctgtgagaaatgatggacctaattataaacattcagaacaacgttattct 
catcaacaaccacgttccaataagagtaataaaaaaacttggttgattgttactatagtg 
ttagccattatcattgctttggtagtaatctttactatagctaaaaatcaaatgtctcca 
gaaaaacaagcgactcacattgcacatgctatcaaaaaagacgatgctaaatcattatct 
aagcaattaacatcaaacgatcatcgtttaaatgaagaagaagcacgtgcgtacttaaag 

10 tatattaaagcagaaagtgatttaaagcatgttgctgacaaagttgaagaaaacaccaaa 
gatattaaaaataatcactataacaatttatctgtaga tgcaaatgataataatatttta 
aatatatctaaagacgggaaaaaatatgttttttttgataactatcaatttaatgttcct 
caaaaaacaattacgctcgtttcaagtgatagtggtgaaattacttatgaatttaacggg 
gataaacatcatatttctgtagaagaagatgacgataaagaattaggaacatttcctatc 

15 ggtgattataatttaaaagcatcaaaagacatggaaggtaaaaattttaaaggcgctatt 
acaattgatatgagtgaaagtgatagtattgcatatgaatcgtttaaacaaaaacgtttt 
aatgttgatactgaaggcggatatatattagataatgtaaaaatatatgctaatggtaaa 
gaaataggcgatggtttttcatcagaaacatatggtccttatgatccagatgaagaagtt 
atcgttcacgctgaaggttcatacgaaggaaaaacttttaaatcaaattcggtcaatgta 

20 gcaagtgcaagcgaaaaagatggtggtgtgacagatgtcacagtcaaattcgatgaagaa 
gctattgatcagtatgtcgataaaaaattagatgaaaaatacgatgattcagatgacgag 
tcagataacgattcaagtagtggcgaagtaacgcgtgaaaatgtaattgataaagtagag 
tcatatgaaggacatacactagatactgatacgtatacgtataaagaacctgaaaaaacc 
ggtgatggtaaatggggtttttcattcttagataaagatggagatttagctggatcgtac 

25 acggtagacattgacgacggttatgttacagaatatgacgaagatggtgaagaagttgga 
tctggttattaa 

Sequence 1860 

MKYCSNCGQPLREGVKVCTNCGTPVRNDGPNYKHSEQRYSHQQPRSNKSNKKTWLIVTIV 
30 LAIIIALVVIFTIAKNQMSPEKQATHIAHAIKKDDAKSLSKQLTSNDHRLNEEEARAYLK 
YIKAESDLKHVADKVEENTKDIKNNHYNNLSVDANDNNILNISKDGKKYVFFDNYQFNVP 
QKTITLVSSDSGEITYEFNGDKHHISVEEDDDKELGTFPIGDYNLKASKDMEGKNFKGAI 
TIDMSESDSIAYESFKQKRFNVDTEGGYILDNVKIYANGKEIGDGFSSETYGPYDPDEEV 
IVHAEGSYEGKTFKSNSVNVASASEKDGGVTDVTVKFDEEAIDQYVDKKLDEKYDDSDDE 
35 SDNDSSSGEVTRENVIDKVESYEGHTLDTDTYTYKEPEKTGDGKWGFS FLDKDGDLAGSY 
TVDIDDGYVTEYDEDGEEVGSGY* 

Sequence 18 61 

Cont ig_0 6 64_pos_l 7 1 1_1 025 , 

40 is similar to (with p-value 1.0e-29) 

>sp: sp| P44068 | Y882_HAEIN HYPOTHETICAL PROTEIN HI0882 . >pir:p 
ir (E64015 IE64015 hypothetical protein HI0882 - Haemophilus i 
nfluenzae (strain Rd KW20) >gp: gp | U32770 | U32770_2 Haemophilu 
s influenzae Rd section 85 of 163 of the complete genome. NI 

45 D: gl573898. 

atgaaaaatcaaaaaatcaactatgataaagtattaagaaagataatttctcaatgggaa 
cgtgatggagaacgccctaaaatcttacttcatagttgttgtgcaccttgtagtacatat 
acgttagagtttttaacacaatatgcggatatagcgatttattttgcgaatcctaatata 
catcccaaaagtgaatatttaagacgtgctaaagttcaagaacaatttgtaaatgatttt 

50 aataataaaacaggtgcaaatgtaaagtatattgaagccgaatatgaaccgcataaattt 
atgaaaatggcaaaagataaaggtttaactgaagagccggaaggtggactaagatgtacg 
gcttgtttcgagatgcgattagaaattgttgcaaaagctgctttagaacatggttacgat 
tattttggtagtgcaatcacactctctccaaagaaaaatgcgcaattaatcaatgaacta 
ggtatggatgtacaaaatatatataatgtaaaatatttaccaagtgattttaaaaagaat 

55 aaagggtatgaacgttctatcgaaatgtgtaatgattataatatttttagacaatgttat 
tgtggttgtgtatttgcagcgatgaagcaaggtatagattttaaacaaataaataaagat 
gctcaagcatttttacaacaattttaa 

Sequence 1862 
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MKNQKINYDECVLRKIISQWERDGERPKILLHSCCAPCSTYTLEFLTQYADIAIYFANPNI 
HPKSEYLRRAKVQEQFVNDFNNKTGANVKYIEAEYEPHKFMKMAKDKGLTEEPEGGLRCT 
ACFEMRLEI VAKAALEHGYDY FGSAITLSPKKNAQLINELGMDVQNI YNVKYLPSDFKKN 
KGYERSIEMCNDYNI FRQCYCGCVFAAMKQGI DFKQI NKDAQAFLQQF* 

5 

Sequence 1863 

Contig_0665_pos_3731_4 393, 

is similar to (with p-value 4.0e-36) 

>gp:gpl Y17797 |EFY17797_7 Enterococcus faecalis gph, ydjH, yd 
10 jG, ydjl, pbp4 and ydiC, ORF2 and ORF3 genes, partial. NID: 
g3341430. 

atgaactatcttctaattgatacgtcaaaccaacctttatcagtagctattatgaaagat 
aatgaagtgattgctgaaaaaacaactgatatcaaaaagaatcattcagtgcaactcatg 
cccgaaatagcagaaattcttacagaaagtaaaataaataaaacagaaataactgatatc 

15 gtggtagcggaaggtccaggttcatataccggtcttagaataggggttactgttgctaaa 
acattagcgtatgcattaaacactaacttatatggtgtctcatcacttaaagcacttgct 
agcacagtaaaagacagtacgaagttgttagtacccatttttgatgctagaagagaagca 
gtttatgcaggtgtttatcaatatcaggacaatgaattaataaccattattgatgacact 
tatatacctatttttgaacttattgaaaaacttcatcaattaaaccaaccttatgtgttt 

20 gtaggatttcatatcgaaaaaataaaacatttattagacagtgacatcgtagaacagtta 
ccacaagcttcaagtatgaagcaattaatccaaaaaccagaaaatatacattcatttact 
cctaaatatcataaattatcagaggcggaacgaaattggttaaaccaacaagagaacaat 
tga 

25 Sequence 1864 

MNYLLIDTSNQPLSVAIMKDNEVIAEKTTDIKKNHSVQLMPEIAEILTESKINKTEITDI 
VVAEGPGSYTGLRIGVTVAKTLAYALNTNLYGVSSLKALASTVKDSTKLLVPIFDARREA 
VYAGVYQYQDNELITII DDTYI PIFELIEKLHQLNQPYVFVGFHIEKIKHLLDSDI VEQL 
PQASSMKQLIQKPENIHSFTPKYHKLSEAERNWLNQQENN* 

30 

Sequence 1865 

Contig_0665_pos_4 4 05_4 827, 

putative peptide of unknown function 

atgagtgtggaagatgttcctaaagtttttgatatagaaagaaatagtttctcacacagt 
35 tcgtggtcaatcgatgcattttatcatgaaatagaaaacaacgaatttgctacatatttt 
gttatagaatttagtgacaaaataattggatatgttggtttatggttagtcgttgatcaa 
gcacaaattacaacgatagctatatcaaaggcatttagaggctatggtcttgggcaact t 
ttacttaaatatgcaatgaactatgcacgtttttcttgtgatgtgatgagtttagaagta 
agaatagataatgatgttgcacaacatgtttataggaatttgggattccaaaatggtgga 
40 aaaagaaagaattattatggagaaggcgaggacgcattagtcatgtgggtgaatttgaaa 
tga 

Sequence 1866 

MSVEDVPKVFDIERNSFSHSSWSIDAFYHEIENNEFATYFVIEFSDKIIGYVGLWLVVDQ 
45 AQITTIAISKAFRGYGLGQLLLKYAMNYARFSCDVMSLEVRIDNDVAQHVYRNLGFQNGG 
KRKNYYGEGEDALVMWVNLK* 

Sequence 1867 

Contig_0665_pos_3707_3267, 
50 is similar to (with p-value 1.0e-67) 

>gp:gp|U71377 |SEU71377_1 Staphylococcus epidermidis autolysi 
n AtlE and putative transcriptional regulator AtlR genes, co 
mplete cds . NID: g2267238. 

atgtatcttggtggtagtactgaaattaaaacatcacaacttaaaggtaaggatgactac 
55 ttaaatgata tat acta ttaccacccaagcgtaaaaagtattatggaatattcaaatctt 
ttacgtaatgatttagatttatctaaaataacaaacaaaaacgatttcttagatcaaaga 
gtcattaaacgatatggttcactcgtacccttaacagaattagatgaagacttattgcgt 
aagaaccaaaaggaatcgactgatagtcagaaagagtctgattcctcatcacaaaataat 
gatgaagaagatcaaactaacgaacaaacagaccaaaatagcttaaacggaaacgaacag 
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tacccaaatcaacaagacaacaatcaaaccaatggtgaaaatggtatgataaataatgac 
aattatccttacgcacaataa 

Sequence 1868 

5 MYLGGSTEIKTSQLKGKDDYLNDI YYYHPSVKSIMEYSNLLRNDLDLSKITNKNDFLDQR 
VIKRYGSLVPLTELDEDLLRKNQKESTDSQKESDSSSQNNDEEDQTNEQTDQNSLNGNEQ 
YPNQQDNNQTNGENGMINNDNYPYAQ* 

Sequence 18 69 
10 Contig_0 665_pos_3219_274 9, 

is similar to (with p-value 1.0e-88) 

>gp:gp| U71377 |SEU71377_2 Staphylococcus epidermidis autolysi 
n AtlE and putative transcriptional regulator At 1R genes, co 
mplete cds. NID: g2267238. 

15 atgagtcgtaaaacatatgaaaaaattgctaatattaatggtatgttcaacgtacttqaa 
caacaaatcattcatagtaaagatatggcattatttaggaatgaatttttctatgtaaac 
catgaacacagagaaaattatgaagcgcttttgatctactacaaggatagtttagataat 
cccgttgtagatggcgcatgttacattcttgcattacctgaaatatttaataaggttgat 
gtgtttgagtctgatttaccttttacctgggtttatgatgaaaatggactttctgataca 

20 atgaaaagtatcagtgtgccccttcaatatcttattgcagccgctttggaagtgactgat 
gtaaatatatttaagccttcaggttttactatgggcatgaacaattggaatattgcccaa 
ttacgtattttttggcaatactgcgctatcgtcagaaaagaggcttta taa 

Sequence 1870 

25 MSRKTYEKIANINGMFNVLEQQIIHSKDMALFRNEFFYVNHEHRENYEALLI YYKDSLDN 
PVVDGACYILALPEIFNKVDVFESDLPFTWVYDENGLSDTMKSISVPLQYLIAAALEVTD 
VNI FKPSGFTMGMNNWNI AQLRI FWQYCAI VRKEAL* 

Sequence 1871 
30 Contig_0665_pos_2148_1753, 

is similar to (with p-value 2.0e-74) 

>gp:gp| U71377 |SEU71377_3 Staphylococcus epidermidis autolysi 
n AtlE and putative transcriptional regulator At 1R genes, co 
mplete cds. NID: g2267238. 

35 atgctagatgattgctttgaaataagaaagtgtgttttcgtcgaagaacaaggcgtacca 
ctcgaaaatgaatttgatcaatatgaagattactcattccatatagtgggatatataaat 
ggtgttcctatggcaactgctagaattagacctttaaatactcatatttgtaaaattgaa 
cgtgtagcaatcatcaagtggtatcgtggtcttgggtacggtaaaaatttaatacatgct 
attgaaacaattgcaaaaaaacaccaatacaatgaactcactatgaatgctcaattacaa 

40 gctcgagacttttacttaaaactaggttactcaccttttggtaaagtattcttagaagaa 
aatataaatcatattagtatgaataagtttttataa 

Sequence 1872 

MLDDCFEIRKCVFVEEQGVPLENEFDQYEDYSFHIVGYINGVPMATARIRPLNTHICKIE 
45 RVAIIKWYRGLGYGKNLIHAIETIAKKHQYNELTMNAQLQARDFYLKLGYSPFGKVFLEE 
NINHISMNKFL* 

Sequence 1873 
Con t ig_0 6 6 5_pos_0_5 2 5 , 
50 is similar to (with p-value 0.0e+00) 

>gp: gp I U71377 | SEU71377_4 Staphylococcus epidermidis autolysi 
n AtlE and putative transcriptional regulator AtlR genes, co 
mplete cds. NID: g2267238. 

atgaaagcaccaagaattgaagaagattatacgtcatatttccctaaatatggctataga 
55 aacggtgtgggacgtcctgaaggtatcgttgttcatgatactgcaaatgataactcaaca 
atcgatggcgagattgctttcatgaaacgtaattacacaaatgcattcgtacacgcattt 
gttgatggcaatagaat tatagaaacagctccgacagat tact tat cttggggtgcaggt 
ccatatggaaatcaacgttttatcaatgttgaaatcgtccatacacatgattatgattca 
tttgcacgttcaatgaacaactacgctgattatgctgcaacgcaattgcaatattataat 
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ttaaaacctgatagcgctgaaaacgatggaagaggaacagtttggacacatgctgctatc 
tctaacttcttaggaggtactgatcacgctgaccctcaccaatatttaagaagtcacaat 
tatagctatgcagaattatatgacttaatttatgaaaaatattta 

5 Sequence 1874 

MKAPRIEEDYTSYFPKYGYRNGVGRPEGIVVHDTANDNSTIDGEIAFMKRNYTNAFVHAF 
VDGNRIIETAPTDYLSWGAGPYGNQRFINVEIVHTHDYDSFARSMNNYADYAATQLQYYN 
LKPDSAENDGRGTVWTHAAISNFLGGTDHADPHQYLRSHNYSYAELYDLIYEKYL 

10 Sequence 1875 

Contig_0667jpos_8 40_1853, 

is similar to (with p-value 0.0e+00) 

>Sp: sp I P4 5554 | DNAK_STAAU DNAK PROTEIN (HEAT SHOCK PROTEIN 70 
) (HSP70) . >gp:gp|D30690|STANHS_3 Staphylococcus aureus gene 
15 s for ORF37; HSP20; HSP70; HSP40; ORF35, complete cds . NID: 
g487326. 

atgggtacagattataaagtagatattgaaggtaaatcatatacaccacaagaactttca 
gcaatgattttacaaaatttaaaaagcactgcagaaaactatttaggggatacagtagac 
aaagctgttatcactgtccctgcttatttcaatgatggtgaacgtcaagcaactaaagat 

20 gctggtaaaattgcaggcttagaagttgaacgtattatcaacgaacctacagctgctgca 
cttgcttatggtttagataaaactgaaacagatcaaaaggttctcgtatttgacttaggt 
gggggaacatttgacgtatctattctagagttaggcgacggcgtatttgaagtattatca 
actgccggagataataaacttggtggcgatgacttcgaccaagtgattattgattatctt 
gtttcagaattcaagaaagagaatggtgtagatttatcacaagataaaatggcattacaa 

25 agattaaaagatgctgccgaaaaagctaaaaaagatttatcaggtgtttctcaaactcaa 
atttcattaccattcatttctgctggagaaaatggcccattacacttagaaattagttta 
actcgttctaaatttgaggaat tagctgattcattaatcaaaaaaactatggaaccgact 
cgtcaagcattaaaagatgctggtttatctacttcagaaatagatgaagttattttagtt 
ggtggttcaacacgtattccggccgttcaagaagctgttaaaaaagaaattgggaaagaa 

30 ccacataaaggtgttaacccagatgaagttgtagcaatgggtgctgctattcaagctggt 
gtaatcacaggtgatgttaaagatgtagtattacttgatgttacgccattatctttaggt 
atcgaaattatgggtggacgtatgaacacattaattgaacgtaatactactattccaact 
tccaaatcacaagtttattctacagcagctgacaatcaaccagcagtagtgtaa 

35 Sequence 1876 

MGTDYKVDIEGKSYTPQELSAMILQNLKSTAENYLGDTVDKAVITVPAYFNDGERQATKD 
AGKIAGLEVERIINEPTAAALAYGLDKTETDQKVLVFDLGGGTFDVSILELGDGVFEVLS 
TAGDNKLGGDDFDQVI I DYLVSEFKKENGVDLSQDKMALQRLKDAAEKAKKDLSGVSQTQ 
ISLPFISAGENGPLHLEISLTRSKFEELADSLIKKTMEPTRQALKDAGLSTSEIDEVILV 

40 GGSTRIPAVQEAVKKEIGKEPHKGVNPDEVVAMGAAIQAGVITGDVKDVVLLDVTPLSLG 
IEIMGGRMNTLIERNTTIPTSKSQVYSTAADNQPAVV* 

Sequence 1877 

Contig_0667_pos_3958_4 263, 

45 putative peptide of unknown function 

atgctcctttctgctatactccttataaaaggaggtgaaaatatgaaaagttttattatt 
gcgtatgatttaaataaccaaaaggattatccaaaattaatagagcgtattgaggattat 
cctaatgttgctaaaatcaataaatcagtttggtttattaattcaactaatgatgctaaa 
actattagaaacgaattaaaaatgtttattgatagcgatgatagtttgttcgttggtaag 

50 ctgactggtgaagccgcatggtctaatgtaatttgcagttcacaacatttaaaagattat 
ctttag 

Sequence 1878 

MLLSAILLIKGGENMKSFIIAYDLNNQKDYPKLIERIEDYPNVAKINKSVWFINSTNDAK 
55 TIRNELKMFIDSDDSLFVGKLTGEAAWSNVICSSQHLKDYL* 

Sequence 1879 

Contig_0667_pos_4 836_5150, 

putative peptide of unknown function 
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atgtgcttttcaaaaagaatgaaacaatcaagagaaaaacaaggtatgactttagctgaa 
ctaggaagaaaaatcggtaaaactgaagctactgtacaacgttatgaaagcgggaatatt 
aaaaatcttaaaaatgatactattgaaagtatagctactgcattaaatgttaaccctgct 
ttcttgatgggttggatagaagaagttgaggaacaaccacaacatcgtgcagcgcatctt 
5 gatggtgatttaactgacgaagaatggcaagaaattcttgattacgctgaatacataaga 
agtaaaagaaaataa 

Sequence 1880 

MCFSKRMKQSREKQGMTLAELGRKIGKTEATVQRYESGNIKNLKNDTIESIATALNVNPA 
10 FLMGWIEEVEEQPQHRAAHLDGDLTDEEWQEILDYAEYIRSKRK+ 

Sequence 1881 

Contig_0667_pos_5154_5624, 

putative peptide of unknown function 

15 gtgttttatgtggggaaatatgaagatatgttaattgaacatgactatattgaagtcatt 
gaatgtgataacttacctaaaaggttatctggtttgtggcttggagatatgattttaatt 
aatcgtaacttgcctattacttccaaacttgaaacacttgcagaggaactcgctcataac 
gaacttacatatggaaatatagttgatcaaagtagttttaatcatagaaaatttgaaggt 
tatgcacgtaggttagcctatgaaaagttaatccctcttaaagatattgtaaaagcattt 

20 ttgcaaggcattcatgacttgtatgaacttgctaatttttttgaggttacagaaggtttt 
gtcctacaaagtattgaacattataaacaaaaatatggttattccactcggtatagcaaa 
tacgttattcaatttgagccgttacgagtgtttgaatataaagatatatag 

Sequence 1882 

25 VFYVGKYEDMLIEHDYIEVIECDNLPKRLSGLWLGDMILINRNLPITSKLETLAEELAHN 
ELTYGNIVDQSSFNHRKFEGYARRLAYEKLIPLKDIVKAFLQGIHDLYELANFFEVTEGF 
VLQSIEHYKQKYGYSTRYSKYVIQFEPLRVFEYKDI* 

Sequence 1883 
30 Contig_0670__pos_324 6_2 653, 

putative peptide of unknown function 

atgggccctcaatattggtggccagcagaaacgccaatagaaatgatgcttggggcaatt 
ctagtccaaaatactaattggaacaatgcagatatagcgttatcaagattaaaagaagaa 
acttcttttaatgcacagacgatattgaaaatgcctttagaatcgttgcagcaagtgata 

35 cgttcgagtggtttctataaaaataaagctaaggctatacaggcattgttactatggtta 
aatcaacatcattttgattatagtagtatagctaagttatacggtgatagcttaagaaaa 
gaat tact caeca tccgtggt a taggtgaagagaccgccgatgtct taatagtatatatt 
tttaaaggtaaagaattcatacctgatagttatactagacgtatttttagaaaattggga 
tatcaacatacagaaagt tat cataaattgaaacaggaattaacacttcct gaat cat tt 

40 tcaaatcaagatgcaaatgagtttcacgctttattagataattttgggaaaaattatttt 
aatggtaaggggaaacaacgctatacctttttagatacctattttaaaaaataa 

Sequence 1884 

MGPQYWWPAETPIEMMLGAILVQNTNWNNADIALSRLKEETSFNAQTILKMPLESLQQVI 
45 RSSGFYKNKAKAIQALLLWLNQHHFDYSSIAKLYGDSLRKELLTIRGIGEETADVLIVYI 
FKGKEFIPDSYTRRIFRKLGYQHTESYHKLKQELTLPESFSNQDANEFHALLDNFGKNYF 
NGKGKQRYTFLDTYFKK* 

Sequence 1885 
50 Cont ig_0 67 0 j?os_2 244_1846, 

putative peptide of unknown function 

atgcaaaatttcaatttaaatgaacagtatgaaaaagaagcagcaagtaagtatggagat 
actcattattatcaagcatataaagataaacaaaaatgtaaggatgaatcagaacagcaa 
aatcattttgaggaaattaataagcaattaaatatgttttttgacgaaatgaatcaacta 
55 tacttaaacaaagtttctatacttgaagcaagtggaaaaactaagaaattacaatgtatt 
ttgaaagaacaagttcctaattgtgacaatcaatttttagaatatatagctcagatttat 
attgaggacgagcgatttgtaaagtttattaacaagcaacgtgaacgtggtttgaattta 
tacatttctgacgcgataaaaacatttattaaattataa 
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Sequence 1886 

MQNFNLNEQYEKEAASKYGDTHYYQAYKDKQKCKDESEQQNHFEEINKQLNMFFDEMNQL 
YLNKVSILEASGKTKKLQCILKEQVPNCDNQFLEYIAQIYIEDERFVKFINKQRERGLNL 
YISDAIKTFIKL* 

5 

Sequence 1887 

Contig_0670_pos_14 09_510, 

putative peptide of unknown function 

gtgatgctccaaatgaaaagtaaactatggatagttatatgtgcattgattgtagtactg 
10 gctgcatgtggtcaagatgccaatcattcatctaataataaagacactgaaaaaagcgat 
aaaaaatatcatagaattatttcgctcattcctagtaacacagaaattttatatcgctta 
ggaatcggagaagatatagttggtgtatccactgtggatgattatcctaaagatgtaaaa 
aaaggtaaaaaacaattcgatgcgatgaatttaaataaagaagaattaataaaagctaaa 
ccggatttgattttagcgcatgagtcacagaaaaattctgcaggtaaagtgctaaagtca 
15 cttaaagataagggagtaaaagtcgtttatgtgaaagatgcacaatcgattgatgaaact 
tatgatacttttaaatcaattggacaattaacggatcgtgaaaaacaagctaaagaactt 
gttgatgaaacaaaacacaatgtagaaaaaatcattaactccgttcctaaacatcataag 
aaacaagaagtgtttatggaagtatcgtctaaaccagacatttacactgccggaaaagat 
accttctttaacgatatgttagagaaactagatgctaaaaatagttttgatgatgttaaa 
20 ggttggaaatcagtaagtaaagaaagcattattaaacgtaatcctgatattctgatttcc 
acagaaggtaaatcaaaatcagactacatagaaatgataaaaaaacgtggcggttttgat 
aaaattaatgctgttaaaaatacacgtattgaaacagtagatggggatgaagtttctcga 
ccaggtcctcgtattgatgaaggtctaaaggatttaagagacgatatatataaaaaatag 

25 

Sequence 1888 

VMLQMKSKLWIVICALI VVLAACGQDANHSSNNKDTEKSDKKYHRIISLIPSNTEI LYRL 
GIGEDIVGVSTVDDYPKDVKKGKKQFDAMNLNKEELIKAKPDLILAHESQKNSAGKVLKS 
LKDKGVKVVYVKDAQS I DET Y DT FKS I GQLTDREKQAKELVDETKHNVEKI INSVPKHHK 
30 KQEVFMEVSSKPDIYTAGKDTFFNDMLEKLDAKNSFDDVKGWKSVSKESIIKRNPDILIS 
TEGKSKSDYIEMIKKRGGFDKINAVKNTRIETVDGDEVSRPGPRIDEGLKDLRDDI YKK* 



Sequence 1889 
35 Contig_0673_pos_1644_1195, 

putative peptide of unknown function 

gtgctaaagcaaacgttgattatagaaattgcaagatttgttcctagtatgaaacttaaa 
aataaaatatataaaaagcttttaaaaatggatgtcgggaatcatacttcatttgcatat 
aaagtgttgcctgatttgttttatccagaatacatttcggt tggcaagaatacagtcatt 
40 ggttataatacaacaatattaactcacgaagtacttgttgatgagtggagagtaggaaaa 
gtcattataggcgattacactttaataggtgcaaatacgacaatattaccaggaataacc 
ataggaaatcatgttaaaattggtgcgggtacggttgtgtctaaagatgttcccgattac 
agttttgcatttggtaatcctatgcaaatacaattagattcaggaggtgacaatgaatgg 
cacaaaaagaaaataacatcattccaatga 

45 

Sequence 1890 

VLKQTLIIEIARFVPSMKLKNKIYKKLLKMDVGNHTSFAYKVLPDLFYPEYISVGKNTVI 
GYNTTILTHEVLVDEWRVGKVIIGDYTLIGANTTILPGITIGNHVKIGAGTWSKDVPDY 
SFAFGNPMQIQLDSGGDNEWHKKKITSFQ+ 

50 

Sequence 1891 

Cont ig_0 67 3_pos_0_8 6 8 , 

putative peptide of unknown function 

atgcttgaaaaaacattcgaagtcacgtatacaaatgaacaaaaaattgaattagaagca 
55 caattgttttcaacacaacttttatttcaatttctcttttcgcaaggtaggttagaagaa 
gcccgaacatatattttgaatcaatcttacgagatacaacagcatagggtgattaggaat 
ttacttgcaatgtgttatttgtatctaggtgagtatgatagcgccaaagcaatgtttgaa 
gaacttttaaaggaagataattcagacgtgcatgcactttgtcactacacattattact t 
tataataaaaaagaaacagaaaaatatcaaaaatatcttaaaatacttaataaagtagta 
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ccactaaatgacgacgaaacctttaaattaggaatcgtattgagttatttaaaacagtat 
cgtgcttctcaaaatttactttatccactttataaaaaaggtaaatttgtctctattcaa 
atgtataatgcattgagtttcaatttttattacctaggaaataaagacgaaagtattgag 
atgtggaacaagctcactcaaatttctgaagttgatgttggttatgcaccttgggtaatt 
5 gaggaaagtaaaacggtatttgaatcacgagtgttaccattattactagatgataataat 
cat tatcgactttacggtatttttttact teat caattaaatggaaaagaaa tact aatg 
actgaagatatttggtcaattcttgaatcaatgaatgactatgagaaactttatctcaca 
tatttggtacaaggactcacactcaataaattagattttatacacagaggtatgcaaagg 
ttgtataattttaagaaattcCAAGAAA 

10 

Sequence 1892 

MLEKTFEVTYTNEQKIELEAQLFSTQLLFQFLFSQGRLEEARTYILNQSYEIQQHRVIRN 
LLAMCYLYLGEYDSAKAMFEELLKEDNSDVHALCHYTLLLYNKKETEKYQKYLKILNKVV 
PLNDDETFKLGIVLSYLKQYRASQNLLYPLYKKGKFVSIQMYNALSFNFYYLGNKDESIE 
15 MWNKLTQISEVDVGYAPWVIEESKTVFESRVLPLLLDDNNHYRLYGI FLLHQLNGKEILM 
TEDIWSILESMNDYEKLYLTYLVQGLTLNKLDFIHRGMQRLYNFKKFQEX 

Sequence 1893 
Contig_0675_pos_34 8_1073, 
20 is similar to (with p-value 2.0e-48) 

>gp:gp| AF080002 |AF080002_2 Heliobacillus mobilis exopolyphos 
phatase Ppx (ppx) gene, partial cds; cobyric acid synthase C 
obQ (cobQ) , UDP-N-acetylmuramyl tripeptide synthetase MurC ( 
murC) , glutamyl tRMA reductase HemA (hemA) , photosynthesis g 
25 ene cluster, complete sequence, stage II sporulation protein 

E Sp2E (sp2E), cell cycle protein MesJ (mesJ) , and ATP-depe 
ndent zinc metallopept idase FtsH (ftsH) genes, complete cds; 

and nucleoside diphosphate kinase B NdkB (ndkB) gene, parti 
al cds. NID: g3820536. >gp : gp | AF080002 [ AF080002_2 Heliobacil 
30 lus mobilis exopolyphosphatase Ppx (ppx) gene, partial cds; 

cobyric acid synthase CobQ (cobQ), UDP-N-acetylmuramyl tripe 
ptide synthetase MurC (murC) , glutamyl tRNA reductase HemA ( 
hemA) , photosynthesis gene cluster, complete sequence, stage 

II sporulation protein E Sp2E (sp2E), cell cycle protein Me 
35 sJ (mesJ) , and ATP-dependent zinc metallopeptidase FtsH (fts 
H) genes, complete cds; and nucleoside diphosphate kinase B 
NdkB (ndkB) gene, partial cds. NID: g3820536. 

atgaatgaactaacggtttatcatttcatgtcagataagcttaatttatacagtgatatt 
ggtaatatcatggcattaaaacaaagagctaaaaaaagaaatattaagataaatgttaaa 

40 gaaatcaatgagactgagggagtcacatttgatgattgtgatattttcttcattggtggt 
gggagtgatagggaacaagcgcttgccacgaaagaattaagtaaaattaaaacttcttta 
aaaaatgcaattgaagatggtatgcctgggttaactatatgcggtggttatcaattttta 
ggtcataagtatattactcctgatggtaccgagttagaaggattgggtgttcttgacttc 
tataccgagtctaaaaaagaacgcttaactggagatatcattatagagagtgatactttt 

45 ggcacgattgttggatttgaaaatcatggtggtagaacatatcatccgtatggaacatta 
ggccgagtaacgtatggt tatggtaataatgataacgatcgaaaagaaggtatacactat 
aaaaatctattaggttcttatcttcacggtccaattttaccaaaaaatcatgaaataact 
gattatctacttgagaaagcatgtgaaagaaaagggatactatttgagcctaagaagatc 
gataacacagaggaagaagctgctaagcaagttctgattaaacgtgcaaaagaaaataaa 

50 aaataa 

Sequence 1894 

MNELTVYHFMSDKLNLYSDIGNIMALKQRAKKRNIKINVKEI NETEGVTFDDCDI FFIGG 
GSDREQALATKELSKIKTSLKNAIEDGMPGLTICGGYQFLGHKYITPDGTCLEGLGVLDF 
55 YTESKKERLTGDI IIESDTFGTIVGFENHGGRTYHPYGTLGRVTYGYGNNDNDRKEGIHY 
KNLLGSYLHGPILPKNHEITDYLLEE<ACERKGILFEPKKIDNTEEEAAKQVLIKRAKENK 

K* 

Sequence 1895 
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Contig_0675jpos_3361_4116, 

is similar to (with p-value 2.0e-44) 

>sp : sp | P19994 | AMPM__BACSU METHIONINE AM I NO PEPTIDASE {EC 3.4.1 
1.18) (MAP) (PEPTIDASE M) . >pir : pir IJS04 93 IJS04 93 methionyl 
5 aminopeptidase (EC 3.4.11.18) - Bacillus subtilis >gp:gp|L47 
971|BACRPLP_16 Bacillus subtilis ribosomal protein (rplPNXEF 
ROQ, rpmCDJ, rpsQNHEMK) genes, integral membrane protein (se 
cY) gene, adenylate kinase (adk) gene, methionine aminopepti 
dase (map) gene, inititation factor 1 (infA) gene, RNA polym 

10 erase alpha (rpoA) gene. NID: gl044970. >gp : gp I Z99104 | BSUB00 
01_138 Bacillus subtilis complete genome (section 1 of 21) : 
from 1 to 213080. NID: g2632267. >gp : gp | D00619 I BACSECY_5 Bac 
illus subtilis genes for ribosomal proteins, SecY, adenylate 
kinase and methionine amino peptidase, complete cds . NID: g 

15 216336. 

atgattgttaaaactgatgaagaattacaagcgttaaaagaaataggttacatttgtgca 
aaagtcagagatactatgaaagaagctactaaacctggagtgactacccgtgaattagat 
catattgccaaagatttatttgaagagcatggggcgatatcagcgcctattcacgatgaa 
aacttcccaggtcaaacttgcattagtgttaacgaagaggtcgcgcatggaatccctggt 

20 aaacgagtaattcgtgaaggtgacctagttaatattgatgtatcagctttaaaaaatggg 
tactatgctgacactggaatttcatt tgttgtagggaaatcagatcaaccacttaaacaa 
aaggtttgtgacgtagccacaatggcttttgaaaatgctatgaaaaaggtgaaaccaggt 
acaaaattaagtaatattggaaaagctgttcatgcaactgcacgccaaaatgacttgact 
gtgattaaaaatttaactggacatggtgttggacaatcacttcatgaggcacctaatcat 

25 gtcatgaattattttgatcctaaagataaaacattattaaaagaagggcaagtcattgca 
gtagaaccattcatatcaacacatgctacatttgtaactgaaggtaaaaatgaatgggct 
tttgaaactaaagataaaagttatgtcgctcaaattgaacacacggttatagttacaaaa 
gatggtccgttacttacaactaagattgatgattaa 

30 Sequence 1896 

MIVKTDEELQALKEIGYICAKVRDTMKEATKPGVTTRELDHIAKDLEEEHGAISAPIHDE 
NFPGQTCISVNEEVAHGIPGKRVIREGDLVNIDVSALKNGYYADTGISFVVGKSDQPLKQ 
KVCDVATMAFENAMKKVKPGTKLSNIGKAVHATARQNDLTVIKNLTGHGVGQSLHEAPNH 
VMNYFDPKDKTLLKEGQVIAVEPFISTHATFVTEGKNEWAFETKDKSYVAQIEHTVIVTK 

35 DGPLLTTKIDD* 

Sequence 1897 

Cont ig_0 67 5_pos_4 7 02_0 , 

putative peptide of unknown function 

40 atgttgattatttttactgctttaatgattattgctaatttttactatatattttttgaa 
aaaattggctttttactagtactcctattaggatgtgtacttgtatatgtagggtatgtg 
tattttcataaagtaagaggactactatctttttggataggaaccttattaattgctttt 
acacttttgtctaataagtacacgataattattctatttatatttttaatagtagtcatc 
atacgttatttggtttataagtttagacctttaaaagtgattgctacagatgaagaaatc 

45 acatcacccatttttattaagcaaaaatggtttggtgaacaacatacaccagtgtatgta 
tataaatgggaagacgtacagattcaacacggtataggagacatacacattgatatgaca 
aaagcggcaaatattaaggaaacaaataccatagttgtgcgtcatattttaggtaaagta 
caagtagttgtacctcttaattataatataaatttacatgcgactctcttctacggcact 
gcttatgtgaacgataaatcttataagattgagaataaccatgttcaaattgaagaaaaa 

50 acgaaagatgataattatactgttaatgtttacgtttcat 

Sequence 1898 

MLI I FTALMIIANFYYI FFEKIGFLLVLLLGCVLVYVGYVYFHKVRGLLSFWIGTLLIAF 
TLLSNKYTI I ILFIFLI VVI IRYLVYKFRPLKVIATDEEITSPI FIKQKWFGEQHTPVYV 
55 YKWEDVQIQHGIGDIHIDMTKAANIKETNTIWRHILGKVQVVVPLNYNINLHATI.FYGT 
AYVNDKSYKIENNHVQIEEKTKDDNYTVNVYVSX 

Sequence 1899 

Contig_0675_pos_3106_2204, 
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putative peptide of unknown function 

atgttgttaaacctcactccaatttttgccatacttactgcaattgttacaattgaacct 
actgctaaagcatcattaaaaaaggggtataaaaggctgccagcaacagttatcggtgcg 
ttatttgctgttgtctttacatatgtcttcggtgatcaatcaccgttaagttatgcttta 
5 agtgctacatttaccattctgatatgcactaaacttaatttacaggtaggaacaactgtc 
gcagtattaacttccgttgcaatgattccaggtatacatgaagcatatgtgttcaatttc 
ttttcacggt tact tacagctct tat aggacttgttacagctggattagtcaattt tatc 
atcttaccacctaagtattatcatcaacttgaagagcaattagcccttagtgagaaaaaa 
atgtatcgtttattttatgaacgctgtaatgagttattattaggaaaattcagctcggaa 

10 aagactagtaaagaattatcaaaattaaatattattgctcaaaaagttgaaacattaatg 
agttaccaaagagatgaacttcattatcataaaaatgaagataattggaaattattaaat 
cgccttacaaatcgcgcttataacaaccgtttatttatttcacatttatctaacattatt 
tatttacccaaacatacgtctattgcttttgatgctaatgagaagatagcattgattaat 
attagtaatagtattaatgacatcattcaaaaaggaagctttgcacgtcaaaaaaaatct 

15 attgcaacactaaagtcttctgttaaacagatggatgagtttgaccaaaatcaaatgaaa 
agtacactcatatatgaaattctactcatatacaaaattttagattcacgttatgcaaaa 
taa 

Sequence 1900 

20 MLLNLTPIFAILTAIVTIEPTAKASLKKGYKRLPATVIGALFAVVFTYVFGDQSPLSYAL 
SATFTILICTKLNLQVGTTVAVLTSVAMIPGIHEAYVFNFFSRLLTALIGLVTAGLVNFI 
ILPPKYYHQLEEQLALSEKKMYRLFYERCNELLLGKFSSEKTSKELSKLNI IAQKVETLM 
SYQRDELHYHKNEDNWKLLNRLTNRAYNNRLFISHLSNIIYLPKHTSIAFDANEKI ALIN 
ISNSINDIIQKGSFARQKKSIATLKSSVKQMDEFDQNQMKSTLIYEILLI YKILDSRYAK 

25 

Sequence 1901 

Contig_0678_pos_933_2351, 

putative peptide of unknown function 

30 gtgattgaattaattaaaatggaagggatgatagttgtgtctaataataattttaaagat 
gatttcgaaaagaatcgtcaatctattaatccagacgaacatcaaacagaattaaaagaa 
gatgataaaacaaatgaaaataaaaaagaagctgactctcaaaacagtt tatctaataac 
tcaaatcaacaatttcctccgagaaatgcccaacgacgaaaaagacgcagagagacagca 
actaatcaaagcaaacaacaagacgacaaacatcaaaaaaatagtgacgctaaaactaca 

35 gaaggttcattagatgaccgttatgacgaagcacagttacagcaacaacatgataaatcg 
caacaacaaaataaaactgaaaaacaatcacaagataatagaatgaaagatggaaaagat 
gcagctattgtaaatggaacatctgagtcaccagaacataaatcaaaatcaacacaaaat 
agacccggccctaaagctcaacaacaaaagcgtaaatcagaaagtacgcaatcaaaaccg 
tcaacaaacaaagataaaaaagcagctacaggtgctggaatagctggtgcagctggtgtt 

40 gctggtgcagcagaaacatccaaacgtcatcataataaaaaagataaacaagattctaaa 
cactcaaaccatgagaatgacgaaaaatctgttaaaaatgatgaccaaaagcaatctaaa 
aaaggcaaaaaagcagcagtcggtgctggcgcagctgcaggagttggtgcggctggtgtt 
gcgcatcataataatcaaaataaacatcataatgaggaaaaaaattctaatcaaaacaat 
cagtacaatgaccaatcagaaggtaagaaaaaaggtggtttcatgaaaatcttgttacca 

45 cttatagcagccattcttattctaggtgcaatagcaatattcggtggtatggctctaaat 
aatcacaacgatagtaaaagtgatgaccaaaaaatagcgaatcaaagtaagaaagactca 
gataaaaaagatggtgcgcaatccgaagataacaaagacaaaaaatctgatagtaacaaa 
gacaaaaaatctgattctgataagaacgcagatgatgactctgataatagttcctcaaat 
cctaacgctacttcaactaataataacgataatgtagccaataataactcaaattataca 

50 aaccaaaatcaacaagataatgcaaaccaaaatagcaataatcaacaggcaactcaaggt 
caacaatcacatacagtatacggtcaagaaaacttatatcgtatcgccatacaatattat 
ggagaaggaactcaagctaacgtagataaaattaaacgtgcgaatggattaagcagtaat 
aatattcataatggtcaaacattagttattcctcaataa 

55 Sequence 1902 

VIELIKMEGMIVVSNNNFKDDFEKNRQSINPDEHQTELKEDDKTNENKKEADSQNSLSNN 
SNQQFPPRNAQRRKRRRETATNQSKQQDDKHQKNSDAKTTEGSLDDRYDEAQLQQQHDKS 
QQQNKTEKQSQDNRMKDGKDAAIVNGTSESPEHKSKSTQNRPGPKAQQQKRKSESTQSKP 
STNKDKKAATGAGIAGAAGVAGAAETSKRHHNKKDKQDSKHSNHENDEKSVKNDDQKQSK 
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KGKKAAVGAGAAAGVGAAGVAHHNNQNKHHNEEKNSNQNNQYNDQSEGKKKGGFMKILLP 
LIAAILILGAIAIFGGMALNNHNDSKSDDQKIANQSKKDSDKKDGAQSEDNKDKKSDSNK 
DKKSDSDKNADDDSDNSSSNPNATSTNNNDNVANNNSNYTNQNQQDNANQNSNNQQATQG 
QQSHTVYGQENLYRIAIQYYGEGTQANVDKIKRANGLSSNNIHNGQTLVIPQ+ 

5 

Sequence 1903 

Contig_0678_pos_2 4 84_34 7 0, 

is similar to (with p-value 0.0e+00) 

>sp: sp| P50736 | YPDA_BACSU HYPOTHETICAL 36.3 KD PROTEIN IN REC 
10 Q-CMK INTERGENIC REGION. >gp : gp I Z99115 I BSUB001 2_235 Bacillus 

subtilis complete genome (section 12 of 21) : from 2195541 t 
o 2409220. NID: g2634478. >gp : gp | Z991 16 | BSUB0013_7 Bacillus 
subtilis complete genome (section 13 of 21) : from 2395261 to 

2613730. NID: g2634723. >gp:gp| L4 764 8 I BACSERA_12 Bacillus s 
15 ubtilis phosphoglycerate dehydrogenase (serA) , ypaA, ferredo 
xin (fer), ypbB, recS, ypbD, ypbE, ypbF, ypbG, ypbH, glutama 
te dehydrogenase (ypcA), ypdA, ypdB, ypdC, spore cortex lyti 
c enzyme (sleB), ypeB, ypfA, ypfB, cytidine monophosphate ki 
nase (cmk), ypfD, ypgA, yphA, yphB, yphC, NAD+ dependent gly 
20 cerol-3-phosphate dehydrogenase (glyc) , yphE and yphF genes, 

complete cds . NID: gll46195. >gp: gp | L4 7648 | BACSERA_12 Bacil 
lus subtilis phosphoglycerate dehydrogenase (serA) , ypaA, fe 
rredoxin (fer), ypbB, recS, ypbD, ypbE, ypbF, ypbG, ypbH, gl 
utamate dehydrogenase (ypcA) , ypdA, ypdB, ypdC, spore cortex 
25 lytic enzyme (sleB), ypeB, ypfA, ypfB, cytidine monophospha 
te kinase (cmk) , ypfD, ypgA, yphA, yphB, yphC, NAD+ dependen 
t glycerol-3-phosphate dehydrogenase (glyc) , yphE and yphF g 
enes, complete cds. NID: gll46195. 

atgcaaactattgaaagcatcattataggtggcggtccatgcggattaagcgcggcaatt 

30 gaacaaaagaaaaaaggtattgaaacattagtaattgaaaaaggtaatgttgttgaatca 
atctataattatccaacacatcagacttttttctcatcaagtgataaattaagtatcggc 
gatattccttttattgttgaagatagtaagccaagacgtaatcaagcgcttgtatattat 
agggaagtcgttaaacatcatcaacttaacatacatccattcgaagaagttttaacagt t 
aaaaaaataaacaataaatttgcaattacaactacaaaaggtgtatatgaatgtaaatat 

35 ttaactgttgctacgggttattatggtcaacataacactttagaagcggaaggggcagaa 
ttaccaaaagtattccattactttaaagaagcacatccgtattttaatcaaaatgttgt t 
attattggaggcaaaaactctgctgttgatgctgccttagaattagaaaaagctggtgct 
aatgtaactgttttatatcgtggcgaacagtaccctaaagcaattaaaccatggatatta 
cccaatttcgaatcat tagtcaatcacgaaaaaattacgatggaatttaatgcgacagta 

40 accaaaattaccgatcattcagtgacttatgaaaaagatggtcaacttatagaaat tgat 
aatgactacgtttttgctatgattggttatcatccagattacgatttcttaaaaacaata 
ggtattgatatccataccaatgaatatggaactgctcctgtttataatcgagaaacattc 
gaaacaaacgtcgaaaattgttatatagctggtgttattgctgcgggtaatgatgcaaat 
actatttttatcgaaaatggtaaatatcatggtggtgtcattacacaaagcattttgaca 

45 aaaaaacaaacacctcttgaaacatag 

Sequence 1904 

MQTIESIIIGGGPCGLSAAIEQKKKGIETLVIEKGNVVESI YNYPTHQTFFSSSDKLSIG 
DIPFIVEDSKPRRNQALVYYREVVKHHQLNIHPFEEVLTVKKINNKFAITTTKGVYECKY 
50 LTVATGYYGQHNTLEAEGAELPKVFHYFKEAHPYFNQNWI IGGKNSAVDAALELEKAGA 
NVTVLYRGEQYPKAIKPWILPNFESLVNHEKITMEFNATVTKITDHSVTYEKDGQLIGID 
NDYVFAMIGYHPDYDFLKTIGIDIHTNEYGTAPVYNRETFETNVENCYIAGVIAAGNDAN 
TIFIENGKYHGGVITQSILTKKQTPLET* 

55 Sequence 1905 

Contig_0678_pos_4 729_537 6, 

is similar to (with p-value 4.0e-52) 

>sp:Sp|P384 93|KCY_BACSU PROBABLE CYTIDYLATE KINASE (EC 2.7.4 
.14) (CK) (CYTIDINE MONOPHOSPHATE KINASE) (CMP KINASE) . >gp: 
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gp|U11687|BSU11687_4 Bacillus subtilis 168 jofA, jofB, MssA 
homolog (jofC) and ribosomal protein SI homolog (jofD) genes 
, complete cds, and joeB gene, partial cds. NID: g533101. >g 
p:gp|Z99115|BSUB0012_229 Bacillus subtilis complete genome ( 
5 section 12 of 21): from 2195541 to 2409220. NID: g2634478. > 
gp:gp| Z99116 I BSUB0013__1 Bacillus subtilis complete genome (s 
ection 13 of 21): from 2395261 to 2613730. NID: g2634723. >g 
p: gpl L47 64 8 I BACSERA_19 Bacillus subtilis phosphoglycerate de 
hydrogenase (serA) , ypaA, ferredoxin (fer), ypbB, recS, ypbD 

10 , ypbE, ypbF, ypbG, ypbH, glutamate dehydrogenase (ypcA) , yp 
dA, ypdB, ypdC, spore cortex lytic enzyme (sleB) , ypeB, ypfA 
, ypfB, cytidine monophosphate kinase (cmk) , ypf D, ypgA, yph 
A, yphB, yphC, NAD+ dependent glycerol-3-phosphate dehydroge 
nase (glyc) , yphE and yphF genes, complete cds. NID: gl!4619 

15 5. >gp:gp|L476481BACSERA_19 Bacillus subtilis phosphoglycera 
te dehydrogenase (serA), ypaA, ferredoxin (fer), ypbB, recS, 
ypbD, ypbE, ypbF, ypbG, ypbH, glutamate dehydrogenase (ypcA 
), ypdA, ypdB, ypdC, spore cortex lytic enzyme (sleB), ypeB, 
ypfA, ypfB, cytidine monophosphate kinase (cmk) , ypfD, ypgA 

20 , yphA, yphB, yphC, NAD+ dependent glycerol-3-phosphate dehy 
drogenase (glyc), yphE and yphF genes, complete cds. NID: gl 
146195. 

atgagttcaattaatattgcactagatggcccagctgctgcaggtaagagtacaattgct 
aaacgtgtagccagtcgtctatcaatgatatatgttgatacaggagcaatgtatcgtgcc 

25 attacatataaatatttacaaaatggcaaacccgaaaattttgattatctgattaataac 
actaaacttgagcttacttatgatgaagtaaaagggcaaagaatcttactagataatcaa 
gacgtcactgattatttaagagaaaatgatgtaacacatcacgtatcttatgttgcatct 
aaagaaccagtgcgttcatttgcagtgaaaatacaaaaagaattagctgctaaaaaaggt 
atcgttatggatggccgagatattggtacagttgtattaccagatgccgaattaaaagtt 

30 tatatgattgcatctgttgctgaacgtgctgaacgtcgacaaaaagagaatgagcaacgt 
ggcattgaatcaaatttagaacaattaaaggaggaaattgaagcacgagatcattatgat 
atgaatcgtgaaatttcgccattacaaaaagccgaagatgctattacacttga tacaact 
ggcaaatctatagaagaggtaacaaatgaaatattatctctactttaa 

35 Sequence 1906 

MSSINIALDGPAAAGKSTIAKRVASRLSMI YVDTGAMYRAITYKYLQNGKPENFDYLINN 
TKLELTYDEVKGQRILLDNQDVTDYLRENDVTHHVSYVASKEPVRSFAVKIQKELAAKKG 
IVMDGRDIGTVVLPDAELKVYMIASVAERAERRQKENEQRGIESNLEQLKEEIEARDHYD 
MNREISPLQKAEDAITLDTTGKSIEEVTNEILSLL* 

40 

Sequence 1907 

Cont ig_0 67 8_pos_5 8 60_7 038, 

is similar to (with p-value 1.0e-92) 

>sp:sp| P384 94 |RS1H_BACSU 30S RIBOSOMAL PROTEIN SI HOMOLOG. > 
45 gp : gp | U11687 | BSU11687_5 Bacillus subtilis 168 jofA, jofB, Ms 
sA homolog (jofC) and ribosomal protein SI homolog (jofD) ge 
nes, complete cds, and joeB gene, partial cds. NID: g533101. 

>gp:gp|Z99115|BSUB0012_228 Bacillus subtilis complete genom 
e (section 12 of 21): from 2195541 to 2409220. NID: g2634478 
50 . >gp:gp|L4 7 64 8|BACSERA_20 Bacillus subtilis phosphoglycerat 
e dehydrogenase (serA), ypaA, ferredoxin (fer), ypbB, recS, 
ypbD, ypbE, ypbF, ypbG, ypbH, glutamate dehydrogenase (ypcA) 
, ypdA, ypdB, ypdC, spore cortex lytic enzyme (sleB) , ypeB, 
ypfA, ypfB, cytidine monophosphate kinase (cmk) , ypfD, ypgA, 
55 yphA, yphB, yphC, NAD+ dependent glycerol-3-phosphate dehyd 
rogenase (glyc), yphE and yphF genes, complete cds. NID: gll 
46195. >gp:gp|L47 64 8|BACSERA_20 Bacillus subtilis phosphogly 
cerate dehydrogenase (serA), ypaA, ferredoxin (fer), ypbB, r 
ecS, ypbD, ypbE, ypbF, ypbG, ypbH, glutamate dehydrogenase ( 
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ypcA) , ypdA, ypdB, ypdC, spore cortex lytic enzyme (sleB) , y 
peB, ypfA, ypfB, cytidine monophosphate kinase (cmk), ypfD, 
ypgA, yphA, yphB, yphC, NAD+ dependent glycerol-3-phosphate 
dehydrogenase (glyc) , yphE and yphF genes, complete cds . NID 
5 : gll46195. 

atgactgaagaattcaatgaatcaatgattaatgatattaaagaaggtgacaaagtcact 
gttgaagttcaacaagtagaggataaacaagttgttgtgcatattaatggtggcaaattt 
aatggaattattcctattagccagctttcaacacatcatatcgaaaaccctagtgaagtt 
gtaaaagtcggtgatgaagtcgaagcatatgtcactaaaatcgagttcgacgaagaaaat 

10 gatactggggcatacattttatcaaaaagacaacttgaaactgaaaaatcttatgaatat 
ttacaagaaaaactagataacgatgaagtgattgaagctgaagttactgaagtagttaaa 
ggtggtttagtcgttgacgttggtcaaagagggtttgtacctgcttctctaatttcaact 
gatttcattgaagatttttctgtattcgatggtcaaacaatccgtattaaagtggaagaa 
cttgatcctgaaaacaatagagtcattttaagccgtaaagctgtggaacagttagaaaac 

15 gacgctaaaaaagcttcaatattagattctttaaatgaaggcgatgttattgatggtaaa 
gttgctcgattaactaactttggtgctttcattgatattggtggcgtagatggtttagtt 
cacgtttctgaat tatctcatgaacat gttcaaacaccagaagaagttgtgtcagtaggt 
gaagcagtcaaagttaaagttaaatctgtagaaaaagattctgaacgtatttctttatct 
attaaagacactttaccaacaccatttgaaaacattaaagggaaatttcacgaagatgat 

20 gttattgaaggtactgtagtacgtttggcgaactttggcgcattcgtagaaattgctcca 
tccgtccaaggtttagtgcatatttctgaaatcgatcataaacatatcggttctcctaac 
gaagtattagaacctggacaacaagttaatgtaaaaatattaggtatcgatgaagataat 
gaaagaat tt cat tat caatcaaagcaacgt tacctaaagaaaatg teat tgaaagtgac 
gcatccacaactcaatcatatcttgaagatgataatgatgaagataaaccaacattaggc 

25 gatgtttttggtgataaatttaaagaccttaagttttaa 

Sequence 1908 

MTEEFNESMINDIKEGDKVTVEVQQVEDKQVVVHINGGKFNGIIPISQLSTHHIENPSEV 
VKVGDEVEAYVTKIEFDEENDTGAYI LSKRQLETEKSYEYLQEKLDNDEVI EAEVTEVVK 
30 GGLVVDVGQRGFVPASLISTDFIEDFSVFDGQTIRIKVEELDPENNRVILSRKAVEQLEN 
DAKKASI LDSLNEGDVIDGKVARLTNFGAFIDIGGVDGLVHVSELSHEHVQTPEEWSVG 
EAVECVKVKSVEKDSERISLSIKDTLPTPFENIKGKFHEDDVIEGTVVRLANFGAFVEIAP 
SVQGLVHISEIDHKHIGSPNEVLEPGQQVNVKILGIDEDNERISLSIKATLPKENVIESD 
ASTTQSYLEDDNDEDKPTLGDVFGDKFKDLKF* 

35 

Sequence 1909 

Cont ig_0 67 8_pos_7 1 1 2_8 380, 

is similar to (with p-value 0.0e+00) 

>gp: gp | D21131 | STASRM551A_1 Staphylococcus aureus gene for a 
40 participant in homogeneous expression of high-level methicil 
lin resistance, complete cds. NID: g531264. 

gtgataattggaggttcaagtttagattcatctcaattattacaagctctatacgaaaca 
ttgtatatggtgactgtatcacttgtaatcggtgctttaataggtatacctcttggcatc 
ttgttagtggtaactagaaaaaacggtatatggtcgaatacaatattgcatcaagtatta 

45 aatcctatcattaatattttaagatcaattccgttcattattttattaatagccatagtg 
ccttttattattattgtaatatcaaaaaaattagatttagtagatcgtcctaatttcaga 
aaagtacatacgaaacctatctcagtgatgggaggaacggtcattttattttctttctta 
atagggatttggctcggacaccctattgaacgtgaggttaaaccgcttatattaggtgca 
attacaatgtatatggttggattgattgatgatatttacgatctaagaccttatttaaag 

50 ttagcaggtcaaattgttgcagctttaattgttacgttttatggaattacaatagacttt 
atttcattgccaattggtccaacgattcattttggcatattcagcattcctattacagta 
atatggattgtagcaattaccaatgctattaatcttatcgacggacttgatggacttgcc 
tcaggcgtctcagcat tggcattaatgactattggattcatcgctattttacaagcgaac 
atatttattatcatgatttgctgtgtacttttagggtctttacttggtttcttattctat 

55 aactttcacccagcgaaaattttcctaggtgatagtggtgcattaatgataggatttatt 
atcggtttcttatccttactcggctttaagaatatcacatttattgcattattctttcct 
atagttatattagcggtgccatttattgatacattatttgcaatgattcgtcgaatgaaa 
aaagggcaacatataatgcaagcggacaagtcacatttacatcataaat tact t get tta 
ggatatacgcatagacaaaccgttttacttatttattcaatagcgattatgtttagttta 
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tctagtgttatcctctatttatcccaaccgttgggtgcacttatgatgt teat tct cat t 
gtctttacgattgagttgatcgttgaatttactggattaatagatgataattatcgacca 
atattaaatttaattacaaaaaaaggaaatggtaagcaacatcattatgatgagcatcac 
cgttcataa 

5 

Sequence 1910 

VIIGGSSLDSSQLLQALYETLYMVTVSLVIGALIGI PLGILLVVTRKNGIWSNTILHQVL 
NPIINILRSIPFIILLIAIVPFIIIVISKKLDLVDRPNFRKVHTKPISVMGGTVILFSFL 
IGIWLGHPIEREVKPLILGAITMYMVGLIDDI YDLRPYLKLAGQIVAALIVTFYGITIDF 
10 ISLPIGPTIHFGIFSIPITVIWIVAITNAINLIDGLDGLASGVSALALMTIGFIAILQAN 
IFIIMICCVLLGSLLGFLFYNFHPAKIFLGDSGALMIGFIIGFLSLLGFKNITFIALFFP 
IVILAVPFIDTLFAMIRRMKKGQHIMQADKSHLHHKLLALGYTHRQTVLLI YSIAIMFSL 
SSVILYLSQPLGALMMFILIVFTXELIVEFTGLIDDNYRPILNLITKKGNGKQHHYDEHH 
RS* 

15 

Sequence 1911 

Contig_0678__pos_4 650_3682, 

is similar to (with p-value 1.0e-43) 

>sp:sp|P30363|ASPG_BACLI L -ASPARAGINASE (EC 3.5.1.1) (L-ASPA 

20 RAGINE AMIDOHYDROLASE) . >pir : pir I S18 999 1 S18 999 asparaginase 
(EC 3.5.1.1) - Bacillus lichenif ormis >gp : gp I Z 1 1 4 97 | BLANSAG_ 
2 B. lichenif ormis ansA gene for asparaginase. NID: g4 9270. 
atgaaacgtctacttatcatacatactggtggcacaataagtatgtcacaagatcaaact 
aataaagtgataacgaatgaagaaaatccaatatcacaacatcaaaatatcattagtcaa 

25 tatgcagaggttgacgaaatcaatcttttaaatataccctcgccgcatatgacaatttcg 
aatgttgtgcga t taagagacgaaatcattacatattctaaagaaaatatatatgatgga 
tttgtcattactcatggaacagatacacttgaggaaacagcttttttaatagatttatta 
attgatattcaagagcctatagtaattactggagcaatgagatcatccaatgaaattggt 
tccgatggtctctataattttatttctgctataagggttgcttcttcatctgaggctaat 

30 cataaaggtgttatggtcgtatttaatgatgagattcacactgctcgtaatgtgacaaag 
acaca tact tcgaatattaatacatttcaaagtcctaatcaggggcctctaggtg tact t 
accaagaatcgagtacaattt tatcatcatccttacagacaaactacctaccaatatatc 
gatgtaaatttacgtgttccacttgtaaaagcatacatgggtatggaagatgatgtacta 
tcattttattcacaacaacacgttgatggtatagtcatcgaagcactaggacaaggtaac 

35 cttccaaaaagttgtcttaatggactacagcaatgtctaaagaaaaacattcctctagtt 
ct cgtat ct aga t ca tt caa t ggt a ttgttagtcctgt at at get t at gaaggtggtggc 
gcagatttgaaaaataatggtgttattttttcgaacggtttaaatggaccaaaggcaagg 
ctaaaattactagttggtttgagtcaagacatgactcaaaatcaattagagcgatatttc 
gaagagtaa 

40 

Sequence 1912 

MKRLLIIHTGGTISMSQDQTNKVITNEENPISQHQNIISQYAEVDEINLLNIPSPHMTIS 
NVVRLRDEIITYSKENI YDGFVITHGTDTLEETAFLIDLLIDIQEPIVITGAMRSSNEIG 
SDGLYNFI SAIRVASSSEANHKGVMVVFNDEI HTARNVTKTHTSN I NTFQS PNQGPLGVL 
45 TKNRVQFYHHPYRQTT YQYI DVNLRVPLVKAYMGMEDDVLS FYSQQHVDG I VI EALGQGN 
LPKSCLNGLQQCLKKNI PLVLVSRSFNGIVSPVYAYEGGGADLKNNGVIFSNGLNGPKAR 
LKLLVGLSQDMTQNQLERYFEE* 

Sequence 1913 
50 Con t ig_0 68 l_pos_3 9 3_1 4 7 8 , 

is similar to (with p-value 0.0e+00) 

>sp:sp|Q24 803|ADH2_ENTHI ALCOHOL DEHYDROGENASE 2 (EC 1.1.1.1 
) (ADH) / ALCETALDEHYDE DEHYDROGENASE (EC 1.2.1.10) (ACDH) . 
>gp:gp|U04863|EHU04863_l Entamoeba histolytica HMl : IMSS alco 
55 hoi dehydrogenase 2 (EhADH2) mRNA, complete cds . NID: g48842 
9. 

gtgctgagacgccgagaaaaccaaccacaaatcaaagtgtttaacgaagt tgaacctaat 
ccatcaactcatacagtctataaggggttagaaatgtttataaatttccaacctaatact 
attattgcactcggtggcggttcggcaatggatgcagccaaagcaatatggatgttcttt 



479 



WO 01/34809 



PCTYUS00/30782 



gagcatccagaaacttcattt tttggggcaaaacaaaagttcttagatattcgtaaacgt 
acttataaaattaccaaacctaaaaacgcaaaatttatatgtataccaacgacatcagga 
actggttctgaagtgacaccttttgcagtaattactgatagcgagacacacgttaagtat 
ccactagcagattatgcgttaactcccgatattgctatcgtcgatccacaattcgtatta 
5 agtgtacctaaagatgttgccgcagatacaggaatggatgttttgacacatgccattgaa 
tcttacgtctctgtcatggcttcagattatacaagaggcttaagcttacaagcaataaag 
ttaacttttgattatctaaaatcatcagttcaagaaaatgacaaacactcacgagaaaaa 
atgcataatgcttcaacaatggccggtatggcatttgccaatgcttttttaggaatt tct 
cattctatcgcacataaaattggtggtgaatatggtattccccacggcagaacaaatgct 

10 attttattaccacatgtcattcgctataatgccaaagatccacaaaaacatgcactgttt 
cctaaatatgatttctttagagcagatactgactatgctgacattgcaaaatttttagga 
ctcaaaggtaatacaactgaagaattagtggatgctctagctaatgcggtgtatgattta 
ggatgttcagttggtattgatatgaatttaaaatcacaaggcgtaactgaagagcttctt 
cactctactatagacagaatggctgaattagcatttgaagatcaatgtacaactgctaat 

15 ccaaaagaaccgctaat tagtgaacttaaaggcattatcgaaacagcatatgattatgaa 
agataa 

Sequence 1914 

VLRRRENQPQIKVFNEVEPNPSTHTVYKGLEMFINFQPNTI IALGGGSAMDAAKAIWMFF 
20 EHPETSFFGAKQKFLDI RKRTYKITKPKNAKFICI PTTSGTGSEVTPFAVITDSETHVKY 
PLADYALTPDIAIVDPQFVLSVPKDVAADTGMDVLTHAIESYVSVMASDYTRGLSLQAIK 
LTFDYLKSSVQENDKHSREKMHNASTMAGMAFANAFLGISHSIAHKIGGEYGIPHGRTNA 
ILLPHVIRYNAKDPQKHALFPKYDFFRADTDYADIAKFLGLKGNTTEELVDALANAVYDL 
GCSVGIDMNLKSQGVTEELLHSTIDRMAELAFEDQCTTANPKEPLISELKGI IETAYDYE 
25 R* 

Sequence 1915 

Cont ig_0 68 l_pos_4 1 98_4 911, 

is similar to {with p-value 2.0e-32) 

30 >gp: gp| AF008 930 | AF008 930_3 Bacillus subtilis choline transpo 
rt system including ATPase (opuBA) , transmembrane protein (o 
puBB) , choline binding protein precursor (opuBC) and transme 
mbrane protein (opuBD) genes, complete cds; and unknown gene 
. NID: g3068551. >gp: gp I Z99121 I BSUB0018_58 Bacillus subtilis 

35 complete genome (section 18 of 21) : from 3399551 to 3609060 
. NID: g2635827. 

gtgtcttcttcaataagtattttatacatattcgtaataattgacggttctgaacctagc 
ttgcctgcgaatgtgat tttatcacctttttgcgctgccataggtatggcaatagctatg 
ataatcacaattacaattgtccctaaagaaatgagcaattttttatatgataaacgttcc 

40 atgtatcttaaaataaaatcaaaaataatagctagaagtgcagctggaatagcacctatt 
aaaatgagtgcactattgttacgatcaatgcctaataatattaaatctcctagaccacca 
gcgcctattaaagctgcgagtgtagcagtaccaatgattaataccatagctgtgcgtatt 
cctgccatgataacaggcattgcaatagggagttcgactttggtcaatcttctaagtggt 
ttcattccaatgcctttagccgcttcaataagagagggatcgacctccttaatacctgtg 

45 tatgtattacgtagaataggaagtaacgcatatacaactaaggcgataattgccggaagt 
ctcccaattccgaaaattggtatcatcaaaccaagtagtgccagtgatggaattgtttgt 
agaacagctgcaatattcatgacgatttctgaaagtttttttgttttagtgagcaaaatt 
gcaattggtactgctataagagttgcaatgaaaagcgctataaaagaaagttga 

50 Sequence 1916 

VSSSISILYIFVIIDGSEPSLPANVILSPFCAAIGMAIAT^IIITITIVPKEMSNFLYDKRS 
MYLKIKSKIIARSAAGI APIKMSALLLRSMPNNIKSPRPPAPIKAASVAVPMINTIAVRI 
PAMITGIAIGSSTLVNLLSGFIPMPLAASIREGSTSLI PV YVLRRIGSNAYTTKAI I AGS 
LPIPKIGI IKPSSASDGIVCRTAAIFMTISESFFVLVSKIAIGTAIRVAMKSAIKES* 

55 

Sequence 1917 

Con t ig_0 68 l_pos_5 8 58_4 97 4 , 

is similar to (with p-value 2.0e-61) 

>gp:gp| AF008930I AF008 930_2 Bacillus subtilis choline transpo 
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rt system including ATPase (opuBA) , transmembrane protein (o 
puBB), choline binding protein precursor (opuBC) and transme 
mbrane protein (opuBD) genes, complete cds; and unknown gene 
. NID: g3068551. >gp : gp | Z 99121 | BSUB0018_59 Bacillus subtilis 
5 complete genome (section 18 of 21): from 3399551 to 3609060 
. NID: g2635827. 

gtgttaattggtccttcaggttgtggtaagacaacaacacttaaaatgatcaatcgttta 
attcctttaagcgaaggttacatatattttaaagataaaccaattagtgattatccagtt 
tacgaaatgcgttgggatattggatatgtactacaacaaattgcattatttcctcatatg 

10 actatcaaagaaaacattgcacaagtaccccaaatgaaaaagtggaaagataacgatatt 
gatttacgtgtggatgaattattgacaatggtaggtttaaatcctgagcaatttaaaaat 
agaaaacctgatgaactatcgggtggacaacgacaaagggttggtgtagtgcgagcatta 
gctgcagatcctcccgttattatcatggatgagccttttagtgctcttgacccaattagc 
agagagaagttgcaggacgatttgattcatttgcaaactcaaattaagaaaacaatagtt 

15 tttgtaacgcatgatattcaagaagcgatgaaactaggtgacagaatttgtttactcaat 
gaagggcgtgttgaacaaattgatacacctgat agtt ttagaacgcgacctaaaagtgac 
tttgtaaagcaattcatgggtagtcatttggatacaacgcataatattgtgaatcaagtg 
aagattaaagatttaggaattaatcggtctgtagatgacaatgcaagcgttattcattat 
cctgaagtagatgctgagttgtacttaaataatatttatgaagatttgtctcattatgat 

20 gcagttgtggtcaacgataaagaacaacatcgtcgatatttgttaaatagagaagatgtg 
tttacttatttatctcttaataaggaggaagcgacacatgaatga 

Sequence 1918 

VLIGPSGCGKTTTLKMINRLIPLSEGYIYFKDKPISDYPVYEMRWDIGYVLQQIALFPHM 
25 TIKENIAQVPQMKKWKDNDIDLRVDELLTMVGLNPEQFKlvIRKPDELSGGQRQRVGVVRAL 
AADPPVIIMDEPFSALDPISREKLQDDLIHLQTQIKKTIVFVTHDIQEAMKLGDRICLLN 
EGRVEQIDTPDSFRTRPKSDFVKQFMGSHLDTTHNIVNQVKIKDLGINRSVDDNASVIHY 
PEVDAELYLNNI YEDLSHYDAVVVNDKEQHRRYLLNREDV FTYLSLNKEEATHE* 

30 Sequence 1919 

Contig_0681_pos_4819_34 67, 

is similar to (with p-value 5.0e-36) 

>gp:gp|AF008930|AF008930_4 Bacillus subtilis choline transpo 
rt system including ATPase (opuBA) , transmembrane protein (o 
35 puBB), choline binding protein precursor (opuBC) and transme 
mbrane protein (opuBD) genes, complete cds; and. unknown gene 
. NID: g3068551. >gp : gp | Z99121 I BSUB0018_57 Bacillus subtilis 
complete genome (section 18 of 21): from 3399551 to 3609060 
. NID: g2635827. 

40 atgaatattgcagctgttctacaaacaattccatcactggcactacttggt tt gat gat a 
ccaattttcggaattgggagacttccggcaattatcgccttagttgtatatgcgttactt 
cctattctacgtaatacatacacaggtattaaggaggtcgatccctctcttattgaagcg 
gctaaaggcattggaatgaaaccacttagaagattgaccaaagtcgaactccctat tgca 
atgcctgttatcatggcaggaatacgcacagctatggtattaatcattggtactgctaca 

45 ctcgcagctttaataggcgctggtggtctaggagatttaatattattaggcattgatcgt 
aacaatagtgcactcattttaataggtgctattccagctgcacttctagctattattttt 
gattttattttaagatacatggaacgtttatcatataaaaaattgctcatttctttaggg 
acaattgtaattgtgattatcatagctattgccatacctatggcagcgcaaaaaggtgat 
aaaatcacattcgcaggcaagctaggttcagaaccgtcaattattacgaatatgtataaa 

50 atacttattgaagaagacacagatgatactgtagaagtcaaagatggcatgggtaaaacc 
tcattcttatttaatgcgcttaagtcagatgaaattgatggttatttagaatttacaggt 
actgtattaggtgaattaacgaaagaagatttaaagtctaaaaaagaaaacgatgtatat t 
caacaagcaaagtctagtttagaaaaaaaatatgatatgacaatgcttaaaccgatgaaa 
tataataatacgtatgcattagctgtaaaacgtgactttgcaaaaaaatatcaaattaag 

55 acaataggtgatttacgcaaggtagaagataaacttaaaccaggttttacattggaattt 
aatgatagaccagatggatacaaagctgttaaaaaaacgtatcatcttaatctttctaat 
gttaaaactatggaacctaaattacgttatactgcagttaaaaagggagatattaatctc 
atagacgcatactctactgatgcagaattaaaacaatataacatggtagtattaaaagat 
gatcaacatgtatttcctccataccaaggagcaccgctatttaaagaaaaatatttaaaa 
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gaccatcctgaagttaaaaaaccgctcaataaattggcgaatagaatcacagatgaagaa 
atgcaagaaatgaactataaggtaacagtgaagaaagaggatccttataaagtagcaaga 
gaatacttagaaaaagaaaaattaataaaataa 

5 Sequence 1920 

MNIAAVLQTIPSLALLGLMIPI FGIGRLPAIIALVVYALLPILRNTYTGIKEVDPSLIEA 
AKGIGMKPLRRLTKVELPIAMPVIMAGIRTAMVLIIGTATLAALIGAGGLGDLILLGIDR 
NNSALILIGAIP7VALLAI IFDFILRYMERLSYKKLLISLGTIVIVI IIAIAIPMAAQKGD 
KITFAGKLGSEPSIITNMYKILIEEDTDDTVEVKDGMGKTSFLFNALKSDEIDGYLEFTG 
10 TVLGELTKEDLKSKKENDVYQQAKSSLEKKYDMTMLKPMKYNNTYALAVKRDFAKKYQIK 
TIGDLRKVEDKLKPGFTLEFNDRPDGYfCAVKKTYHLNLSNVKTMEPKLRYTAVKKGDINL 
IDAYSTDAELKQYNMVVLKDDQHVFPPYQGAPLFKEKYLKDHPEVKKPLNKLANRITDEE 
MQEMNYKVTVKKEDPYKVAREYLEKEKLIK* 

15 Sequence 1921 

Cont ig_068 l_pos_208 6_1 54 7, 

putative peptide of unknown function 

atgacgcgacagagaatcgccattgatatggatgaagtgcttgctgatacattgggtgct 
gttgttaaagcggtcaatgaacgagcggatttaaatatcaaaatggaatcattaaacggt 

20 aaaaaattaaaacatatgatacccgagcatgaggggttagtcatggatatt ttaaaagaa 
cctggattctttagaaatttagatgtaatgccgcatgctcaagaagttgtaaaacaactc 
aatgagcattacgacatatacatagccacagcagcgatggatgttccaacctcttttcat 
gacaaatatgaatggttattagaatactttccttttttagatccgcaacattttgtattt 
tgtggtagaaagaatattattcttgcagattatcttattgatgataatccaaagcaattg 

25 gaaatttttgaagggaaatcaattatgtttactgcttctcataatgttaatgaacataga 
tttgaacgcgtaagtggttggagagatgtaaagaattattttaattcaattgaaaaatag 



Sequence 1922 

30 MTRQRIAI DMDEVLADTLGAVVKAVNERADLNIKMESLNGKKLKHMIPEHEGLVMDILKE 
PGFFRNLDVMPHAQEVVKQLNEHYDI YIATAAMDVPTSFHDKYEWLLEYFPFLDPQHFVF 
CGRKNI ILADYLI DDNPKQLEI FEGKS IMFTASHNVNEHRFERVSGWRDVKNYFNSIEK* 



35 Sequence 1923 

Cont ig_0 68 3_pos_7 97 5_7 574, 

putative peptide of unknown function 

atgataaaaaaaatagaacacaatcgcaaaaacaaacaaaatgatacttcaaatcaaaat 
cgtgatactaatcaacaccaagaccaaactcaaccaacaaataatgactataacaacgat 
40 aatcaatcaggtactgaacaaccagcacaacaacctaactaccatcaatacccaaataat 
aatcaacagtctggttcaaataaaaataactcttcagaaaataacaaacagaaaccgaat 
cagaacaaaactaatcaatcatatcatcaaccagcacaatcaacaccacaacagtcgtca 
caacataataatcaatctgattcacaacaaaatggcaactcaaataataattccaacaat 
caaaatcatggaacaaatgataaacagaataaaaatcgttaa 

45 

Sequence 1924 

MIKKIEHNRKNKQNDTSNQNRDTNQHQDQTQPTNNDYNNDNQSGTEQPAQQPNYHQYPNN 
NQQSGSNKNNSSENNKQKPNQNKTNQSYHQPAQSTPQQSSQHNNQSDSQQNGNSNNNSNN 
QNHGTNDKQNKNR* 

50 

Sequence 1925 

Contig_0683_pos_6967_64S5, 

putative peptide of unknown function 

atggttggaacggtgttaagtggttttgaatataaagcacaaaaagaaaaatatgataac 
55 ttaacaaagttctttaaagacaatgaagaaaaatatcaatataccggttttactaaagaa 
gcaatacataagacacagaatgttggatatcaaaatgagtattattatttagcaggtaac 
gttactaatattaataattatagaaaatattatgaacctttaataaaaaaagattctaag 
aatttcaaagaaggcatgaaaaaagcaaatgaagcaacaaatttcaaagccaaaattgaa 
gttgtttcaacattattcagtactaaatctgatttcactaaaaataactctaagaaaga t 
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ttattattcttaagtgatgatttatatcattacaaagaaaaacctgaaaacacaaacata 
actttacaattaagtgagccaaaaattaattctacacgcgcattttatgatgctaacaac 
ccattagaatatggagtgcataaacatgagtaa 

5 Sequence 1926 

MVGTVLSG FEYKAQKEK YDNLTKFFKDNEEKYQYTGFTKEAI HKTQNVGYQNEY YYLAGN 
VTNINNYRKYYEPLIKKDSKNFKEGMKKAN EATNFKAKIEVVSTLFSTKSDFTKNNSKKD 
LLFLSDDLYHYKEKPENTNITLQLSEPKINSTRAFYDANNPLEYGVHKHE* 

10 Sequence 1927 

Contig_0683_pos_597 6_544 9, 

putative peptide of unknown function 

atgatggtaggaactgtgttaagcggttttgagtatagagcaaataaggaaaaaatggat 
aacttagaaaaatacctcaaagataaagaagataaatatcactatactggattcaccgat 

15 gaagcaataactaaaactcaaaatataggttatcaaaataattacttttacattactact 
agttccacgaaattacgagattatagaaaacattttgaacctttaattaaagaaagtgac 
gatgattttaaaaagcatatgaaacaattaaagtctaaaaaagatacgtatattaataca 
gaaataacgactacacttttcag tact ctggacgaatatgacgaaaaaat cat tagaaaa 
aatactttatccatggctaaagaaatgagaaaagagccatctattccacataatttcaca 

20 ttccacttattatttagcaataacaaattaaaaatcaacgatccaaacataagtaacaat 
caaattaatgagtatagggtgttcgaccatgacggatttaaaaattaa 

Sequence 1928 

MMVGTVLSGFEYRANKEKMDNLEKYLKDKEDKYHYTGFTDEAITKTQNIGYQNNYFYITT 
25 SSTKLRDYRKHFEPLIKESDDDFKKHMKQLKSKKDTYINTEITTTLFSTLDEYDEKIIRK 
NTLSMAKEMRKE PS I PHNFTFHLLFSNNKLKI NDPN I SNNQI NEYRVFDH DGFKN* 

Sequence 1929 

Contig__0683_pos_3077_14 91, 

30 is similar to (with p-value 2.0e-17) 

>sp:sp| P4 6321 |CELR_BACSU PUTATIVE CEL OPERON REGULATOR. >pir 
:pir | S57758 | S57758 probable eel operon regulator - Bacillus 
subtilis >gp:gp| Z4 9992 | BSCELABCD_1 B.subtilis celA, celB, ce 
1C, celD and ywaA genes. NID: g895746. >gp: gp I Z99123 | BSU3002 

35 0_155 Bacillus subtilis complete genome (section 20 of 21) : 
from 3798401 to 4010550. NID: g2636240. >gp : gp | D83026 I D8302 6 
_60 Bacillus subtilis genome sequence covering lic-cel regio 
n. NID: gl783231. 

atggaaaagcatacattcactatcaaagagctatcttcaacttaccatttaacaaaatca 

40 aaagtgattgattatgtaacacgtatacaaacgtgggctataaaat ttgatatttattta 
tcaattaaaaagaagcaaggtatcatgatcgatgcgagtacaacgagtatcagtaatgct 
gtacttcatatcaatcaacttacagacgatgactttaaagttgaaaaccttattttacaa 
gagttacctcaagcccatactagaaaaataaaacaaattatctcaaagcatatagataat 
catcaattatcaacttctgaaaataaaatacaacaattacttgtgcatctaattttaatt 

45 atcaaacattctcaaccagaggaagaagattggagcactgatacagaatctttaactat t 
gcgaaaaagtgtataaaagatatcaatgaaacccttggatatcaacttaacaataaaaca 
agtgaatgcttttccttttttattagctaccatttcaataagtttgatttagggatccaa 
caactatttattcaaagttatatcgatcgactcattgaattaatggagcaacatattggt 
tttcccttttcacaagatacaattttaaaagataatatgaacgtccactttagtcgtaca 

50 tatttgcgattaatgagtcatgtttatctaaataatccattaacaagtcaaatcaaacga 
ctatatccctttgtctttaatacactatatgatagtattcgacaat tatcacaagatacc 
aatatccaattaagcgaagatgagattgcctttttaactatacattttcagtcttctatc 
gaacgccataagtcatcacatattcatgttgtaattgcttgttattatggcttaggcat t 
tcaacgttgcttgctgagaaaatcaaacaacttaatcatgcaatacagatcgtagataca 

55 ttaaaacttgaagatattaataacta teat tttgaagggattgacttattaat tact act 
cacgactttgatacaagtcaacttttacaaatacctaaagtcatacaagtatcacctt ta 
ttttcagatgaagatgctaaaaaaatcgaattctttgtaaaagccatgcaaaacccatta 
tcaaaagatgatatattatcaaaaattcagttgagtgttgagtccaatttcaaactgaat 
cattcaaatcacattcttccaatttttgagaaatccaaagaaattttagattatcatcat 
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gcaactctagatggctatatagaaagtgccatagatcgcgaaaaacaatcttcaacatat 
ataggtaaagggatagcactcccacacggcaaccctgaaaaagtactgaaatcacacatg 
attatatttaaaccttctcaacctataacatggaaacaacatgaagttaaacttgttttc 
tttttagcaatgagtaaaaaagatttaaatattaaccgtaaaattatacaatcaattgct 
5 caattagaagaagatgacatccatcaattatgtcttttagatgatttacaactaaaaaac 
actttgtatgcacgttttaaagaataa 

Sequence 1930 

MEKHTFTIKELSSTYHLTKSKVIDYVTRIQTWAIKFDI YLSIKKKQGIMIDASTTSISNA 
10 VLHINQLTDDDFKVENLILQELPQAHTRKIKQIISKHIDNHQLSTSENKIQQLLVHLILI 
IKHSQPEEEDWSTDTESLTIAKKCIKDINETLGYQLNNKTSECFSFEISYHFNKFDLGIQ 
QLFIQSYIDRLIELMEQHIGFPFSQDTILKDNMNVHFSRTYLEUiMSHVYLNNPLTSQIKR 
LYPFVFNTLYDSIRQLSQDTNIQLSEDEIAFLTIHFQSSIERHKSSHIHVVIACYYGLGI 
STLLAEKIKQLNHAIQIVDTLKLEDINNYHFEGIDLLITTHDFDTSQLLQIPKVIQVSPL 
15 FSDEDAKKIEFFVKAMQNPLSKDDILSKIQLSVESNFKLNHSNHILPIFEKSKEILDYHH 
ATLDGYIE5AIDREKQSSTYIGKGIALPHGNPEKVLKSHMIIFKPSQPITWKQHEVKLVF 
FLAMSKKDLNINRKIIQSI AQLEEDDIHQLCLLDDLQLKNTLYARFKE* 

Sequence 1931 
20 Contig_0 684_pos_128 9_1969, 

is similar to (with p-value 3.0e-25) 

>sp:sp|Q4 9435| Y4 4 2_MYCGE HYPOTHETICAL PROTEIN MG442. >pir:pi 
r | H64248 | H64248 hypothetical protein homolog MG442 - Mycopla 
sma genitalium (SGC3) >gp: gp | U39731 I U39731_l Mycoplasma geni 

25 talium BS17, pilB_2, rpL19, trmD genes from bases 546767 to 
554372 (section 53 of 56) of the complete genome. NID: gl046 
159. >gp:gp|u*39726|U39726_4 Mycoplasma genitalium section 48 

of 51 of the complete genome. NID: g3845031. 
atgacgaatttaaaagaattagaaaaatgggaaacttattttaaagatgaaggtttctat 

30 ccggtagctgtagatgcaaaacatggcaagaatcttaaaaatgttgaagttgaagctata 
aaagcaactcaagaaaaatttgatcgtgaaaaagctaaaggtttaaaacctagagcgata 
agagctatgattgtaggcattcctaatgtaggaaaatcaacacttatcaataagttagca 
aaacgtagtatcgccgaaact ggaaataaaccaggagtaacaaaacagcaacaatgga tt 
aaagttggaaagtctcttcaattactagatacaccaggtattttatggcctaaattcgaa 

35 gatgaagaagtcggtaaaaaat taagtttaactggtgcaat taaggatagtatcgttcat 
ttagatgaggtagctatttatggtttgaattttatgattaaacatgatgtttcagcttta 
aagagacattataatattgatacacatgaagacgctgagatactcgattggtttgatgca 
attggaagaagaaggggattgttacaaaaaggaaatgaagtagattatgaatctgtcatt 
gagttgatcatcaatgatatgagaaatgcaaaaattggaacttattgttttgatatttta 

40 aaagaaatgaagagtgaatga 

Sequence 1932 

MTNLKELEKWETYFKDEGFYPVAVDAKHGKNLKNVEVEAIKATQEKFDREE<AKGLKPRAI 
RAMIVGIPNVGKSTLINKLAKRSIAETGNKPGVTKQQQWIKVGKSLQLLDTPGILWPKFE 
45 DEEVGKKLSLTGAIKDSIVHLDEVAI YGLNFMIKHDVSALKRHYNIDTHEDAEILDWFDA 
IGRRRGLLQKGNEVDYESVIELI INDMRNAKIGTYCFDILKEMKSE* 

Sequence 1933 
Contig_0684_pos_197 4_274 4, 
50 is similar to (with p-value 9.0e-41) 

>gp:gp| AF005098 | AF005098_1 Lactococcus lactis RNAseH II (rnh 
B) gene, partial cds, positive regulator GadR (gadR) , GadC ( 
gadC) and glutamate decarboxylase (gadB) genes, complete cds 
. NID: g2352483. 

55 atgtctctaacaattaaagaaatcaaagaaaaactatctcgaattgaaacgttggaagag 
ttacataaacatgaagcaaataatgattcacgtaaaggtgttataaatgcgattaagtct 
agggaaaaaaatattcttaagcaacaagcattagaagagcactatttatccatgaatcaa 
tacgaaaacaacattatgtcctctaacagggatgcattaatttgtggaattgatgaggta 
gggcgtgggcccttggctggaccagttgtggcttgtgcagttattttagagaagaatcat 
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cattatattggtttagatgactctaaaaaagtgtctcccaaaaatagagcacgacttaat 
caaaatttaaaagaaaatgtctatcaatatgcatatggcatagcgtcctcagttgaaata 
gatgaattgaacatttatcgggcaactcaattagctatgctacgtgctataaatcaatta 
gatgttacacctacacatttattaatagacgcaatgacactagatattgatattccacaa 
5 acctcaattattaaaggtgatgctaaaagtgtgtctatcgcagcagcaagtatcatggct 
aaagaataccgtgatcaatatatgagacaactatctaaacagtttccagaatatggtttt 
gataaaaatgcaggttatggaactaagcaacatttaaaggctattgatcaagtgggcata 
atcaatgaacatcgtcaatcatttgaaccaattaaatcaatgatgaaataa 

10 Sequence 1934 

MSLTIKEIKEKLSRIETLEELHKHEANNDSRKGVINAIKSREKNILKQQALEEHYLSMNQ 
YENNIMSSNRDALI CGI DEVGRG FLAG PVVACAVILEKNHHYIGLDDSKKVSPKNRARLN 
QNLKENVYQYAYGI ASSVEI DELNIYRATQLAMLRAINQLDVTPTHLLI DAMTLDI DI PQ 
TSIIKGDAKSVSIAAASIMAKEYRDQYMRQLSKQFPEYGFDKNAGYGTKQHLKAIDQVGI 

15 INEHRQSFEPIKSMMK* 

Sequence 1935 
Contig_0684_pos_2 94 5_4 018, 
is similar to (with p-value 0.0e+00) 
20 >gp : gp I Z99112 I BSUB0009_7 9 Bacillus subtilis complete genome 
(section 9 of 21): from 1598421 to 1807200. NID: g2633902. > 
gp:gp|AJ00097 5 |BSYLQGCOD_3 Bacillus subtilis ylqg to codV ge 
ne region. NID: g2462964. 

gtggaaaaagcgaaagaattaaattcagacgtatatgtggttaaagcgcaaattcacgct 

25 gggggtagaggtaaagcaggcggcgtgaaaattgctaaatcattatctgaagtcgaaacg 
tacgcaaatgaactgctaggtaaacaat tggt.cacacatcaaactgggccagagggcaaa 
gaggtcaaacgtttatatatcgaagaaggatgcgatatccaaaaagaatattatgttggt 
tttgttattgatcgtgctactgataaagtgactttgatggcatcagaagaaggtggaact 
gaaattgaagaggttgcagctcaaacacctgaaaagattttcaaagaaacaattgatcca 

30 gtagtaggattatcaccttaccaagcgcgacgtatcgcttttaatattaacattccaaaa 
gaatcagttggaaaagcaactaaatttttattagcactatataatgtctttatcgaaaaa 
gattgttctattgttgaaattaacccacttgttacaactggagacggtcaggtattggct 
ttagatgctaaattaaactttgatgataatgcattatttagacataaagatattttagaa 
ttacgagatttagaagaagaaga tcctaaggaaatagaagcttctaaatatgatttatca 

35 tacatcgctttagatggagatattggttgtatggttaatggcgcaggtttagccatggca 
actatggatacaattaatcattttggtggaaatccagccaacttcttagatgtaggtggc 
ggtgctacaaaagaaaaggtaactgaagcatttaaaattattttaggtgatgacaatgtt 
aaaggtatctttgtaaatatttttggtggaattatgaaatgtgatgttattgccgaaggt 
attgtagcagcggttaaagaagt tgaactaacattaccattagttgttcgtttagaagga 

40 actaatgtcgaacgtggtaaagcaatattaaacgaatcaggtttagctattgagccagca 
gcaactatggctgaaggtgctcaaaaaattgtgaaacttgttaaagaagcataa 

Sequence 1936 

VEKAKELNSDVYWKAQIHAGGRGKAGGVKIAKSLSEVETYANELLGKQLVTHQTGPEGK 
45 EVKRLYIEEGCDIQKEYYVGFVI DRATDKVTLMASEEGGTEI EEVAAQTPEKI FKETIDP 
VVGLSPYQARRIAFNINIPKESVGKATKFLLALYNVFIEKDCSIVEINPLVTTGDGQVLA 
LDAKLNFDDNALFRHKDILELRDLEEEDPKEIEASKYDLSYI ALDGDIGCMVNGAGLAMA 
TMDTINHFGGNPANFLDVGGGATKEKVTEAFKIILGDDNVKGIFVNI FGGIMKCDVIAEG 
IVAAVKEVELTLPLVVRLEGTNVERGKAILNESGLAIEPAATMAEGAQKIVKLVKEA* 

50 

Sequence 1937 

Contig_068 4_pos_4121_4 64 5, 

is similar to (with p-value 1.0e-69) 

>gp:gp| Z99112 |BSUB0O09_8O Bacillus subtilis complete genome 
55 (section 9 of 21): from-1598421 to 1807200. NID: g2633902. > 
gp:gpl AJ000975 I BSYLQGCOD_4 Bacillus subtilis ylqg to codV ge 
ne region. NID: g2462964. 

atgcttgattatgggactcaaattgttgcaggggtaacacctggtaaaggtggacaagtt 
gtagaaggtgttccagtatataacactgttgaagaagctaaaaatgaaacaggagctaat 
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gtatctgttgtatacgtaccagcaccattcgctgctgattcaattattgaagcagctgat 
gccgatttagacatggttatttgtattactgaacatatacctgttgttgatatggttaaa 
gtaaaaagatatttacaaggtcgtaaaacacgtttagtaggaccaaactgtcctggtgtg 
ataactgccgacgagtgtaaaatcggtattatgccgggatatatccataaaaaaggccat 
5 gtcggtgtcgtgtctcgttctggtacattaacgtatgaggcagtgcatcaattaactgaa 
gaaggtatcggtcaaacaactgctgtaggtatcggcggtgatccagtaaatgggactaac 
tttatt tgtctaa teat agacaaaagtgctcatctct cat tctag 

Sequence 1938 

10 MLDYGTQIVAGVTPGKGGQVVEGVPVYNTVEEAKNETGANVSVVYVPAPFAADSIIEAAD 
ADLDMVICITEHIPWDMVKVKRYLQGRKTRLVGPNCPGVITADECKIGIMPGYIHKKGH 
VGVVSRSGTLTYEAVHQLTEEGIGQTTAVGIGGDPVNGTNFICLIIDKSAHLSF* 

Sequence 1939 
15 Contig_0684_pos_8 692_9123, 

is similar to (with p-value 2.0e-20) 

>gp:gp|U96107|SCU96107_3 Staphylococcus carnosus N5 , NIO-meth 
ylenetetrahydromethanopterin reductase homolog, SceB precurs 
or (sceB) and putative transmembrane protein genes, complete 
20 cds, and putative Na+/H+ antiporter NhaC (nhaC) gene, parti 
al cds. NID: g2735503. 

atgaaattcaaaaaattattatctcgtattattatcgctacaatgattacatttactgga 
acactctcatatcaagctattgaacaaacgcatatttcccatgctgcacataattattat 
ggtaaaaaacaatgcacttggtgggcatttaaacgtcgtgctcaattaggtaaacctgta 
25 tcaaatcgatggggtaatgctaagaattggtatagcaatgcacgtcgatctggttatgca 
actggacataagcctcgaaaatacgctgt tatgcaatcaacgagaggctat tatgggcac 
gtagcagtggttgaaaaagtatataagaatggaaaaatcaaaatttctgaatataat tat 
aatgtgccattaggctacggcacacgcattattagtaaatcgtctgcacgaaactataat 
tatatttattaa 

30 

Sequence 1940 

MKFKKLLSRII IATMITFTGTLSYQAIEQTHISHAAHNYYGKKQCTWWAFKRRAQLGKPV 
SNRWGNAKNWYSNARRSGYATGHKPRKYAVMQSTRGYYGHVAVVEKVYKNGKIKISEYNY 
NVPLGYGTRI ISKSSARNYNYI Y* 

35 

Sequence 1941 

Contig_0684_pos_9983_0, 

is similar to {with p-value 1.0e-85). 

>gp:gp| Z99122 I BSUB0O19_80 Bacillus subtilis complete genome 
40 (section 19 of 21): from 3597091 to 3809700. NID: g2636029. 

>gp:gp|Z92954 |BSZ92954_7 B. subtilis yws [A, B, C, D, E, F, G3 and g 
erBC genes. NID: gl894764. 

atgaataaattaggtcgtcgacgtctagtaatgctcatcgctatagtatttattattggg 
gcactgattcttgctgcatcaaccaatttagcattattaattattggacgtttaattatt 

45 ggtttagcagtgggtggt tcgatgtctacggtacctgtttatttaagtgaaatggcaccg 
acagaatatcgtggctcactaggttcacttaatcaactaatgattacaattggtatttta 
gcagcatatttagtcaactatgcatttgcggatattgagggttggcgttggatgctagga 
ttagcagttgttccatcggttattttacttgtgggtatttattttatgcctgagagtcca 
agatggttacttgaaaatagaaacgaagaagccgctcgtcaagtgatgaagattacttat 

50 gacgatagcgaaattgataaagaacttaaagagatgaaagaaattaacgctatctctgaa 
t eta catggacagtcattaaat caeca tggttaggtagaatattaat tgtaggttgtata 
tttgctattttccagcaatttattggtatcaatgcagtcattttctattcatcttcaatc 
tttgctaaggctggactgggtgaagcggcgtctatattaggttcagttggtataggaacL 
attaatgttcttgtaacaatagttgccatttttgtagtagataagattgatcgtaaaaaa 

55 ttacttgttggtggtaatattggtatgattgcctcattattaattatggcaatcttaatt 
tggacaattggaattgct teat cagcgtggattattattgtttgtttatcat tat t tat t 
gtattctttgggatttcttggggacctgttctatgggttatgctacctgaattattccca 
atgcgcgcacgtggcgctgctacgggcatttcagcgcttgtgctaaatatcggaacgctt 
a t cgt gt ca 1 1 gt t ct tcccaa tat t a 
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Sequence 194 2 

MNKLGRRRLVMLIAIVFIIGALILAASTNLALLIIGRLIIGLAVGGSMSTVPVYLSEMAP 
TEYRGSLGSLNQLMITIGILAAYLVNYAFADIEGWRWMLGLAVVPSVILLVGI YFMPESP 
5 RWLLENRNEEAARQVMKITYDDSEIDKELKEMKEINAISESTWTVIKSPWLGRILIVGCI 
FAIFQQFIGINAVIFYSSSIFAKAGLGEAASILGSVGIGTINVLVTIVAIFVVDKI DRKK 
LLVGGNIGMIASLLIMAILIWTIGIASSAWIIIVCLSLFIVFFGISWGPVLWVMLPELFP 
MRARGAATG I SALVLNI GTLI VS LFFP I L 

10 Sequence 194 3 

Contig_0684_pos_10911_10567, 

is similar to (with p-value 3.0e-26) 

>sp:sp| P4 6333I YXBC_BACSU HYPOTHETICAL METABOLITE TRANSPORT P 
ROTE IN IN HTPG-IOLR INTERGENIC REGION. 

15 atgcccgtagcagcgccacgtgcgcgcattgggaataattcaggtagcataacccataga 
acaggtccccaagaaatcccaaagaatacaataaataatgataaacaaacaataataatc 
cacgctgatgaagcaattccaattgtccaaattaagattgccataattaataatgaggca 
atcataccaatattaccaccaacaagtaattttttacgatcaatcttatctactacaaaa 
atggcaactattgttacaagaacattaatagttcctataccaactgaacctaatatagac 

20 gccgcttcacccagtccagccttagcaaagattgaagatgaatag 

Sequence 1944 

MPVAAPRARIGNNSGSITHRTGPQEIPKNTINNDKQTI IIHADEAI PIVQIKIAI INNEA 
IIPILPPTSNFLRSILSTTKMATIVTRTLIVPIPTEPNIDAASPSPALAKIEDE* 

25 

Sequence 1945 

Contig_0684_pos_974 8_9200, 

is similar to (with p-value 5.0e-44) 

>sp : Sp I PI 3702 | MVAA_PSEMV 3-HYDROXY-3-METHYLGLUTARYL-COEN2YME 
30 A REDUCTASE (EC 1.1.1.88) (HMG-COA REDUCTASE) . >pir:pir|A44 
756IA44756 hydroxymethylglutaryl-CoA reductase (EC 1.1.1.88) 
- Pseudomonas sp. >gp : gp | M24 015 | PSEHMGCOA_l P.mevalonii HMG 
-CoA reductase (mvaA) gene, complete cds . NID: gl51258. 
atggaacgagcgtcagttcttgcacaagtagatatacatcgtgctgcaacacataacaaa 
35 ggtgtgatgaatggtatacacgctgtagtattggctacaggcaatgatacaagaggagtt 
gaagcaagtgctcatgcatatgcaagcaaagatggtcattatagagggatagctacttgg 
gaatatgatcgctcacgtaataaattggttggaactattgaagttcctatgactttagcg 
acagtaggtggaggtacgaaagttttacctattgctaaagcctcattaaatttgcttaat 
gttgaaaatgcacaggaactagggcaagttgttgctgctgttggattagcacaaaatttc 
40 tctgcatgtagagcgctagtgtctgaggggatacaacaaggacatatgagtttacaatat 
aaatcattagcgattgttgtaggtgcaaaaggcgaagaaattgcgcaagtagctgaagcg 
ctcaaatatgaatcacaagctaatactgccaaagctcaagaaatcttgatgaatataaga 
aagtcataa 

45 Sequence 194 6 

MERASVLAQVDIHRAATHNKGVMNGIHAWLATGNDTRGVEASAHAYASKDGHYRCIATW 
EYDRSRNKLVGTIEVPMTLATVGGGTKVLPIAE^ASLNLLNVENAQELGQVVAAVGLAQNF 
SACRALVSEGIQQGHMSLQYKSLAIVVGAKGEEIAQVAEALKYESQANTAKAQEILMNIR 
KS* 

50 

Sequence 1947 

Cont ig_0 68 4_pos_7 905_7 14 7, 

is similar to (with p-value 2.0e-44) 

>sp : sp | P39592 | YWBI_BACSU HYPOTHETICAL TRANSCRIPTIONAL REGULA 
55 TOR IN EPR-GALK INTERGENIC REGION. >pir : pir | S39679 | S39679 hy 
pothetical protein - Bacillus subtilis >gp: gp | X73124 | BSGENR_ 
25 B. subtilis genomic region (325 to 333). NID: g413923. >gp 
:gp|Z99123|BSUB0020_126 Bacillus subtilis complete genome (s 
ection 20 of 21): from 3798401 to 4010550. NID: g2636240. 
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atggctgtccctttatttgaccggagtaaaagaagtttagtacttactgatgcaggtaaa 
atttttttcaagaaatgtcaagaaatcatcgcactatatgataatttgcccactgaaatt 
aatagtttgtatggtttagaaacaggtcatatcactattagtatgtctgcagtgatgagc 
atgcgtaaatttattggcgtattaggagactttcatcaactttatccgaatattacgtac 
5 aacttaatcgaaagtggtggtaagacgactgaaaaccttatacttaatgatgaagtggat 
attggtgtgacaacattgccagtagatcatcaaaaatttgaatgtatatctttaaacaaa 
gaagaactgactgtagttttaaataaagaacatcctttagcacaaaaatcttctattaaa 
atggaagaattagctgatgagaacttcattttatttaatgaagatttctatctcaacgat 
aaaattattgaaaatgcgaagaatgctggattcgtgccgaacatggcctcacaaatctca 
10 caatggaatgtgattgaaaatcttgtcattaatcaattaggtatttccatattgccagcc 
acta tagcacaatt act taatgatgacgtcaaaattgtacatttggaaaatgcacataca 
acttgggagcttggtgtcgtttggaaaaaagataaacgtttaagtcatgctacaaataaa 
tggatagaatttttgaaagaaagattatccgaagaataa 

15 Sequence 194 8 

MAVPLFDRSKRSLVLTDAGKIFFKKCQEIIALYDNLPTEINSLYGLETGHITISMSAVMS 
MRKFIGVLGDFHQLYPNITYNLIESGGKTTENLILNDEVDIGVTTLPVDHQKFECISLNK 
EELTVVLNKEHPLAQKSSIKMEELADENFILFNEDFYLNDKI IENAKNAGFVPNMASQIS 
QWNV I ENL V INQLGISI LPAT I AQLLN DDVKI VHLENAHTTWELG VVWKKDKRLS HATNK 

20 WIEFLKERLSEE* 

Sequence 194 9 

Contig_0684_pos_6784_64 4 3, 

putative peptide of unknown function 

25 atgcttattacttttataggcacagaagttcaaaaattacttcatatacctctagcaggt 
agtatcgtagggcttatgctttttttcctattgttacaatttaaaattgtacctgaatca 
tggattaatgtaggagcagactttttacttaaaacaatggttttcttctttatcccatca 
gtggtaggaattatggatgttgcatctaatatcacgatgaattatatattattctttatt 
gttattataattggtacatgccttgtagcactatcatcaggttatatcgctgaaaaaatg 

30 ctagaaaaaagcaatacacgtaaaggaactgatcactcatga 

Sequence 1950 

MLITFIGTEVQKLLHIPLAGSIVGLMLFFLLLQFKIVPESWINVGADFLLKTMVFFFIPS 
VVGIMDVASNITMNYILFFIVIIIGTCLVALSSGYIAEKMLEKSNTRKGTDHS* 

35 

Sequence 1951 

Cont i g_0 6 8 4_pos_6 2 7 5_5 7 57, 

putative peptide of unknown function 

atgaaaggtggtacctggattaaccatgttttaaacgctacagttgtatgtcttgcatac 
40 ccactttatcaaaataaaaagaaaataaaaaaatatttaacaattattttcacaagcgtg 
ttgactggtgtagttctcaattttgtgttagtatttacaacgttgaaaatctttggttat 
tctaaagacacaattgttaccctgttacctagatcaattacagcagcagtaggtatagag 
gtttctcaagaattgggaggaacagatacaattactgtgctctttatcataactacaggt 
ttaatcggcagtattttaggttcaatgcttttacgtatgggaggttttaaatcttccatt 
45 gcgcgaggactaacttatgggaatgcttctcacgcatttggtaccgcaaaagcattagag 
cttgatattgaatcaggagcgttcagttcaattggtatgattttaacagcagtcattagt 
tctgttctcataccagtactgattttattgttttactaa . 

Sequence 1952 

50 MKGGTWINHVLNATWCLAYPLYQNKKKIKKYLTIIFTSVLTGVVLNFVLVFTTLKIFGY 
SKDTIVTLLPRSITAAVGIEVSQELGGTDTITVLFIITTGLIGSILGSMLLRMGGFKSSI 
ARGLTYGNASHAFGTAKALELDIESGAFSSIGMILTAVISSVLIPVLILLFY* 

Sequence 1953 
55 Contig_0684_pos_5707_5195, 

is similar to (with p-value 2.0e-21) 

>sp:sp| P42405I YCKG_BACSU HYPOTHETICAL 19.0 KD PROTEIN IN TLP 
C-SRFAA INTERGENIC REGION (ORF10) . >gp : gp I D30762 j BACYCK_10 3 
acillus subtilis DNA around 28 degrees region of chromosome 
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containing yckA-H genes. NID: g710627. >gp:gp| D50453 | D50453_ 
49 Bacillus subtilis DNA for 25-36 degree region containing 
the amyE-srfA region, complete cds. NID: gl805369. 
atggcacaaattaaagcaaatgaagcattagttaaggcattacaaacctggaatattgat 
5 catttatatggtattcctggcgactcagtagatgctgttgttgatagcttacgtacggtc 
agagatcaatttaaattctctgaagatgcttcaattaaagcagcagttgaagaagcgcat 
aaacatggaaaagcattgcttgttgatatgatagcagtgcaaaacttagaacaacgtgct 
aaagaactagatgagatgggtgcagactatatcgcagttcatacaggttacgacttacaa 
gctgaaggaaaatctccattagacagcttgcgtacagttaaatctgttatcaaaaactct 
10 aaggttgcagtagcaggtggtattaaaccagatactatcaaagatattgttgctgaagat 
ccagatttagttattgttggtggcggtattgcgaatgctgacgatcctgtagaagcagca 
aaacaatgtagagcagctattgaaggtaaataa 

Sequence 1954 

15 MAQIKANEALVKALQTWNIDHLYGIPGDSVDAVVDSLRTVRDQFKFSEDASIKAAVEEAH 
KHGKALLVDMI AVQNLEQRAKELDEMGADYIAVHTGYDLQAEGKS PLDSLRT VKS VI KNS 
KVAVAGGIKPDTIKDIVAEDPDLVIVGGGIANADDPVEAAKQCRAAIEGK* 

Sequence 1955 
20 Cont ig_0 68 7_pos_4 33_1 17 3, 

is similar to (with p-value 3.0e-23) 

>sp:sp|P30267|YKAA_BACFI HYPOTHETICAL 50.9 KD PROTEIN IN KAT 
A 3* REGION (ORE A). >pir : pir | S27 4 91 | S274 91 hypothetical prot 
ein A - Bacillus firmus >gp : gp I L02548 I BACKATA2_1 B.firmus OR 

25 FA and ORF B, complete cds. NID: gl43118. 

gtgccggtaacgattaataataactcttcaattttattagatcattttgtcacatggata 
agtagcgcattacctcttttaactaagatattcataatgattatcattatactaggtgct 
atttatccatttattaaagggacatggaatcggaataccgttgaaacaatttttagttta 
tttaaagttttgggagtcattataggcgttttgttaatttttaacattgggccaagttgg 

30 ttacttaatgaacaaacgggaatgtatgtttttaactatttggtaattccggtaggatta 
acagtacctgcaggaggcgcggtattagctttattagtaggatatggcttattagaattt 
gtaggtgtttatgcgcaaaaaattatgtacccgatatggaaaacgcctggacgttcagca 
gttaatgctttagcatcttttgttgctagttt tgctgtgggtt tacttataacgaataaa 
gagtataaagaaggtaaattcacggaaaaacaagctgttatcatagcaaccggcttttct 

35 acagttactgtagcttttatgatagttattgctaaaaccttacacttaatggatatatgg 
aatttatatttttggtctaccttgtttgttactgctgcagtaacagcttgtacagttagg 
atttggcctatcagtaaaattagcaacacatattatgatcagccatttatagaagaagat 
acaagcgaattaaaaggttaa 

40 Sequence 1956 

VPVTINNNSSILLDHFVTWISSALPLLTKIFIMII IILGAIYPFIKGTWNRNTVETIFSL 
FKVLGVI IGVLL I FN IG PS WLLNEQTGM YVFN YLVI PVGLT V PAGG AVLALLVG Y'GLLE F 
VGVYAQKIMYPIWKTPGRSAVNALASFVASFAVGLLITNKEYKEGKFTEKQAVIIATGFS 
TVTVAFMIVIAKTLHLMDIWNLYFWSTLFVTAAVTACTVRIWPISKISNTYYDQPFIEED 

45 TSELKG* 

Sequence 1957 

Contig_0687_pos_1231_1665, 
is similar to (with p-value 5.0e-21) 
50 >sp:sp|P30267|YKAA_BACFI HYPOTHETICAL 50.9 KD PROTEIN IN KAT 
A 3'REGION (ORF A) . >pir : pir I S27 4 91 | S27 4 91 hypothetical prot 
ein A - Bacillus firmus >gp:gp | L02548 I BACKATA2_1 B.firmus OR 
F A and ORF B, complete cds. NID: gl43118. 

gtgatgaaaaatatttggttgaatttcaaagaaagtctgattatgactatgaatatctta 
55 cccaccatattatcaataggtttaatttgcttgttactcgcagaatatacagtgattttc 
gattatttagcatatgttttttatccattaacttggatacttcaaataccagattccttt 
ttaactgcaaaaggcgcagctattggtataacagaaatgtttttgccttcattaattgta 
gtcgaagcaccattaatcactaaatttataattgctgttact tctgtttctacaattata 
ttcttttcagctagtgtgcctagtattctctctactgatatacccatccgcataagagat 
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ttagtggttatatggtttgagagaactgtattgagtttaattatagtaacacctatcgca 
tatatttttttataa 

Sequence 1958 

5 VMKNIWLNFKESLIMTMNILPTILSIGLICLLLAEYTVIFDYLAYVFYPLTWILQIPDSF 
LTAKGAAIGITEMFLPSLIWEAPLITKFIIAVTSVSTIIFFSASVPSILSTDIPIRIRD 
LVVIWFERTVLSLI I VTPIAYI FL* 

Sequence 1959 
10 Contig_0687_pos_2701_3702, 

putative peptide of unknown function 

atgtccttatttaaagattttttcattgcgttatctaatcatacatatcttaataaaata 
get aaaaagatgggacctcaaatgggagcaaatcgtgtagtcgcaggaaatacgat teat 
caattaattgagactatacaatatttaaatgattataatatttcagttactgtcgactca 

15 ttgggtgaatttgttaatactagagaagaaagcattaaagctaaagaagagattttagaa 
attatcgatgcaatatatagcaataatgttaaggcacatatgtcagtcaagataagtcaa 
cttggaagtgagtttgatttaaatcttgcttatgaaaacatgagagaaattttacttaaa 
gctgataagaatgggaagatgcatattaatattgatacagagaagtacgatagtctttct 
aaaattcaacatataattgatagattgaaaggtgaatttaaaaatgtgggtacagtcgtt 

20 caagcttatctgtatgaagccgatgatataattgataaatatcctgaattacgtttgaga 
cttgtgaaaggtgcttataaggaagatgcgtcaatcgcttttcaatcaaaagaagaaatt 
gacgcaaattatattagaattattaaaaaacgactactaaattcaaagaactttacatcg 
gtggctacacatgacaatgaaataatcaaccaagtcaaacaatttatgaaggaaaatcat 
gtcagcaaagataaaatggaatttcaaatgttgtacggtttccgcacggaattagcacaa 

25 aaaatagctaatgaaggttatttttttacagtttatgtaccatacggtaatgattggttt 
gcgtactttatgagaagactagcagaacggcctcaaaacttgtcattagctataaaagaa 
t ttactaaacccaaaatcttaaaaaaggtaaccttgggtataggtatatttgcaacttta 
ttgacgtctcttattcttggcattaaaagacataaaaaataa 

30 Sequence 1960 

MSLFKDFFIALSNHTYLNKIAKKMGPQMGANRVVAGNTIHQLIETIQYLNDYNISVTVDS 
LGEFVNTREESIKAKEEILEII DAIYSNNVKAHMSVKISQLGSEFDLNLAYENMREILLK 
ADKNGKMHINI DTEKYDSLSKIQHI I DRLKGEFKNVGTVVQAYLYEADDI I DKYPELRLR 
LVKGAYKEDASIAFQSKEEIDANYIRI IKKRLLNSKNFTSVATHDNEI INQVKQFMKENH 

35 VSKDKMEFQMLYG FRTELAQK I ANEG Y FFTV Y VPYGN DWFAY FMRRLAER PQNLS LAI KE 
FTKPKILKKVTLGIGI FATLLTSLILGIKRHKK* 

Sequence 1961 

Cont ig_0 68 7_pos_8 1 4 7_7 7 4 6, 

40 putative peptide of unknown function 

atgtttttaacaaaaatggctaccgaatggagacatttttgtgcaggattcctcttaatg 
ttgaagtataacgacactacgcaaaattatgagttgtttgacatcgttattaatgcaacg 
ggctctaaaacacatctttctcaattagatgaggatgatcaattaattttaaacttagaa 
aatagacaaattgttcaacgt catcctatgggtggcattcaaattatcccagaaacaaat 

45 caagtcataagccccagatatggaaccttaaaaaatgtgattgcaattggacaaatgacc 
aacggtgtcaataaacttagaaatggcgtaaagatgattgttaatcaagttgttgataca 
gtatctcaattatatataacacaggaaaatagaaataagtaa 

Sequence 1962 

50 MFLTKMATEWRHFCAGFLLMLKYNDTTQNYELFDIVINATGSKTHLSQLDEDDQLILNLE 
NRQIVQRHPMGGIQII PETNQVISPRYGTLKNVIAIGQMTNGVNKLRNGVKMIVNQVVDT 
VSQLYITQENRNK* 

Sequence 1963 
55 Contig_0687_posJ7297_6266, 

is similar to (with p-value 3.0e-59) 

>sp:sp| P17 618 I RIBG_BACSU RIBOFLAVIN-SPECIFIC DEAMINASE (EC 3 
.5.4.-). >pir :pir IS45543I PN0100 ribG protein - Bacillus subt 
ilis >gp:gp|L09228 |BACDIA_10 Bacillus subtilis spoVA to serA 
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region. NID: g410114. >gp : gp | X51510 I BSRIB_2 B.subtilis ribo 
flavin biosynthesis operon ribG, ribB, ribA, ribH, and ribT 
genes. NID: g4O083. >gp : gp I Z99116 I BSUB0013_40 Bacillus subti 
lis complete genome (section 13 of 21) : from 2395261 to 2613 
5 730. NID: g2634723. 

atggatgatgctattcaactagcaaaaatggtaaatggacaaacaggtgttaatccacca 
gtaggatccgttgttgttaaaaacggtaggattgtaggtttaggtgcacatttaaaaaag 
ggagataaacatgccgaagtacaagctattgaaatggcaggtttaaatacccaaggtgct 
accatatacgtttcattagaaccttgcacacaccatggttcaacaccaccttgtgtgcat 

10 aaaatcattgaagcgggcatatctaaggtcatctatgctgttaaagatactactttagta 
agtaagggtgacgagattctgagagaagctggtatagaggttgaatttcaatataatgaa 
aatgcagctgcattataccgtgacttttttactgctaaaagaaacgaagttccagaagta 
actgtaaaggtctcatctagtctagatggtaaacaagcaacagactttaatgaaagtaag 
tggataacaaacaaagaagttaaagaagatgtttatcaattaagacatgagcatgatgca 

15 gttattactgggcgtagaaccattgaagcagacaatccattgtatacaaccagggttcct 
gatggaaagcatccgattcgagttattctttctaagaaaggtcaactcgattttaatcaa 
caaatatttaaagatactgcatcggagatatggatttacactgaaaatgaaaaattaaaa 
acaaataaaagttttattaaaataataaatattagtaattgtgatacaacgacaatatta 
caagacttatatcaaagaggtattgggaaactgctagtcgaggcaggcccaaatattaca 

20 tctcaatttctccaatccaaacatctaaatgaactcattttatatatagccccgaaatta 
attggtggttctggcaaacatcaattttataagactgacgaggtcattgatttgcctgaa 
gcaactcaatttgaaattgttgattccaagttaattaatcaaaatttaaaattgaaatta 
cgaaagaagtga 

25 Sequence 1964 

MDDAI QLAKMVNGQTGVN PPVGS VVVKNGRI VGLGAH LKKG DKH AEVQAI EMAGLNTQGA 
TIYVSLEPCTHHGSTPPCVHKI IEAGISKVI YAVKDTTLVSKGDEILREAGIEVEFQYNE 
NAAALYRDFFTAKRNEVPEVTVKVSSSLDGKQATDFNESKWITNKEVKEDVYQLRHEHDA 
VITGRRTIEADNPLYTTRVPDGKHPIRVILSKKGQLDFNQQIFKDTASEIWI YTENEKLK 

30 TNKSFIKI INISNCDTTTILQDLYQRGIGKLLVEAGPNITSQFLQSKHLNELILYIAPKL 
IGGSGKHQFYKTDEVIDLPEATQFEIVDSKLINQNLKLKLRKK* 

Sequence 1965 

Contig_0687_pos_6265_5627, 

35 is similar to {with p-value 1.0e-44) 

>sp: sp | PI 64 40 | RISA_BACSU RIBOFLAVIN SYNTHASE ALPHA CHAIN (EC 
2.5.1.9). >pir :pir IS45544 IA35711 riboflavin synthase (EC 2. 
5.1.9) alpha chain - Bacillus subtilis >gp : gp | L09228 1 BACDIA_ 
11 Bacillus subtilis spoVA to serA region. NID: g410114. >gp 

40 : gp { X51510 | BSRIB_3 B.subtilis riboflavin biosynthesis operon 
ribG, ribB, ribA, ribH, and ribT genes. NID: g40083. >gp:gp 
I Z991 16 | BSUB0013_39 Bacillus subtilis complete genome (secti 
on 13 of 21): from 2395261 to 2613730. NID: g2634723. 
atgtctatgtttacaggtatcattgaagaaataggtactgtacaacaagttcgctctgaa 

45 caatcagtaagaacgcttgaaattaaagcacaaaacattttagttgatatgcatattggt 
gattcaataagtgttaacggtgcatgtttaactgtgatagatttcactgactcaagtttt 
tcagttcaagtcatcaaagggactgaaaacaaaacatatcttggaagtgttcaacgtaat 
acagaagttaatctcgaaagagccatgagtggaagtgggagatttggtggacatttcgtg 
ttaggtcatgttgatgagcttggaacaatttctaaaatcaatgaaactgctaactcaaaa 

50 attatttctattaaaacaactaaaaatattttgaatcaaatggtaaagcaaggttctata 
actgtagacggagttagtcttactgtatttgatttacatgattatacttttgatatacat 
cttataccagaaacacgtcgatctactattctttcatctaaaaaagtgggcgacaaagtg 
cacttggagtctgacgtactat tcaaatatgttgaaaacatcatgaatcaaaatcaatcg 
cagttaacagaagaaaagcttagagcatttggtttttag 

55 

Sequence 1966 

MSMFTGIIEEIGTVQQVRSEQSVRTLEIKAQNILVDMHIGDSISVNGACLTVIDFTDSSF 
SVQVIKGTENKTYLGSVQRNTEVNLERAMSGSGRFGGHFVLGHVDELGTISKINETANSK 
IISI KTTKNILNQMVKQGSITVDGVSLTVFDLHDYTFDIHLIPETRRSTILSSKKVGDKV 
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HLESDVLFKYVENIMNQNQSQLTEEKLRAFGF* 
Sequence 1967 

Contig_0687_pos_2457_1804 , 
5 putative peptide of unknown function 

atgtggaagtgggaaacagaaaatgacgcaaaaggcgttgttgtcattgctcataatatt 
ttagaacatacaggtagatatgcatatgttatcacgatgttaagacgaaatggttatcac 
gttatcatgggcgatttaccgggacaagggcaaacttcacgagctcaaaagggacaaata 
gatgattttaatacgtatcatgaaaatatattagagtggataaaaatagctaatgaatat 

10 aaaattccaacatttgttttaggtgtgggactaggtggtctcatcattttaaatctgctt 
gagaaaacagaattacctattgagggtatcttgttattt'tcacctatgttagaactaaag 
agagactataaagggcgcaaaaataaattgatttctaatgttggtaaaatttctaaagat 
actagatttaaagttggtataactcctcaagatttaacacgtaatgatgaaattattgaa 
gaaacagcaaatgatggactaatgcttaaaaaggtaacatatagttggtataaccttata 

15 aatgaaaagatgaaagaaacaatggatcatatcagagatattaaacctatttcagcattg 
ataatgtatggtaccaatgataaaatttttcaacaattcattttgtatgattag 

Sequence 1966 

MWKWETENDAKGVVVIAHNILEHTGRYAYVITMLRRNGYHVIMGDLPGQGQTSRAQKGQI 
20 DDFNTYHENILEWIKIANEYKIPTFVLGVGLGGLIILNLLEKTELPIEGILLFSPMLELK 
RDYKGRKNKLISNVGKISKDTRFKVGITPQDLTRNDEIIEETANDGLMLKKVTYSWYNLI 
NEKMKETMDHIRDIKPISALIMYGTNDKIFQQFILYD* 

Sequence 1969 
25 Contig_0687_pos_1802_1392, 

putative peptide of unknown function 

gtgtaccctaaatatataattgagtttaaaataaaaaacagtaaattaagatggaaattt 
catatccttgctttccttatcactataaatatttgtaaaaaaggtgatttttttatatac 
gatattttattattgaattataaaaaaatatatgcgataggtgttactataattaaactc 
30 aa t acagt tctctcaaaccat a taaccactaaatctctt a tgcggatgggtatatcagt a 
gagagaatactaggcacactagctgaaaagaatataattgtagaaacagaagtaacagca 
attataaatttagtgattaatggtgcttcgactacaattaatgaaggcaaaaacatttct 
gttataccaatagctgcgccttttgcagttaaaaaggaatctggtatttga 

35 Sequence 1970 

VYPKYIIEFKIKNSKLRWKFHILAFLITINICKKGDFFI YDILLLNYKKI YAIGVTIIKL 
NTVLSNHITTKSLMRMGISVERILGTLAEKNIIVETEVTAIINLVINGASTTINEGKNIS 
VIPIAAPFAVKKESGI* 

40 Sequence 1971 

Contig_0688_pos_4 702_3980, 

is similar to (with p-value 7.0e-53) 

>sp:sp|P42312|YXJA_BACSU HYPOTHETICAL 43.7 KD PROTEIN IN KAT 
B 3 , REGION. >gp:gp|Z99123|BSUB0020_197 Bacillus subtilis com 

45 piete genome {section 20 of 21): from 3798401 to 4010550. NI 
D: g2636240. >gp: gp | Z99124 I BSUB0021_6 Bacillus subtilis comp 
lete genome (section 21 of 21): from 3999281 to 4214814. NID 
: g2636442. >gp : gp I D83026 I D83026_20 Bacillus subtilis genome 
sequence covering iic-cel region. NID: gl783231. 

50 atgatgtctatgagttcagtttcaggagcaattgtgggcgcttatgtgcaaatgatacct 
ggagaacttgtattgacggcaattccacttaatattattaacgcaattatagtttcttgt 
attttgaatcctgtatcagttgaagaacaagaagatgtcgtgtatagcattaaagatcac 
caaactgaaagacaaccatttttctcatttcttggagattcagttttagcagctggaaag 
cttgtattaattatcattgcatttgtcattagctttgtagctttggctgacttaattgat 

55 agattgattcatttaatcacacatcttattgcaaatggtattggtgtcaaaggtagcttt 
ggtcttgatcaaatcttaggcgttttcatgtatccatttgctttactattaggtttaccg 
tttaatgaagcgtgggaagtagcacaacaaatggcgaagaaaattgtaacaaacgaattt 
gttgtgatgggggaaatttctaatcaagtcaatgcgatgacgcctcatcatagagcagtt 
atatcaacatttttagtttcttttgcaaacttttcaactattggaatgattataggtaca 
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ttgaaaggtattgttgataagaaaacgtcggatttcgtttccaaatatgtaccgatgatg 
ttgttagcaggaattttagtatccttacttactgctgcatttgttggattatttgcttgg 
taa 

5 Sequence 1972 

MMSMSSVSGAIVGAYVQMI PGELVLTAIPLNI INAI IVSCILNPVSVEEQEDVVYSIKDH 
QTERQPFFSFLGDSVLAAGKLVLIIIAFVISFVALADLIDRLIHLITHLIANGIGVKGSF 
G LDQ I LG V FM Y P FALLLG L P FN E AWE VAQQMA KK I V T N E F VVMGE I S NQVN AMT P H H RA V 
ISTFLVSFANFSTIGMIIGTLKGIVDKKTSDFVSKYVPMMLLAGILVSLLTAAFVGLFAW 
10 * 

Sequence 1973 

Cont i g_0 68 8_pos_3 4 1 0 2 610, 

is similar to (with p-value 2.0e-26) 

15 >sp:sp|P54 4 78|YQFU_BACSU HYPOTHETICAL 32.5 KD PROTEIN IN CCC 
A-SODA INTERGENIC REGION. >gp : gp | D84 4 32 | BACJH64 2_14 6 Bacillu 
s subtilis DNA, 283 Kb region containing skin element. NID: 
g2627063. >gp: gp I Z99116 | BSUB0013_221 Bacillus subtilis compl 
ete genome (section 13 of 21) : from 2395261 to 2613730. NID: 

20 g2634723. 

atgataggttcatttattttctctgcaggtatcaatgcatttgttatttcagggaatttg 
ggtgagggtggtgtcactggtatagccatcgtattatattatgcttttcatatttcaccg 
ggaataaccaatttcgttttaaatgctattttaattattgtgggttataaatatttgagt 
aaacgtagtacatatttaacaatttttgctacagtactcatttcaatctttctaggttta 

25 actgaaacatggcatgtagaaactgggaatgttgtgattaatgctgtgttcggtgggact 
tgtgttggtttaggaattggtattatcgttttagcagggggaacaaccgctggaacggtt 
attcttgcgagaattgttaataaatatttagatattagtacgccttacgctttgttattc 
tttgaccttatcgttgtgcttatttcattgacagaaattcctttagtgaagtgcttagtt 
acagttatgtctttatatataggtacaaaagtgatggaatttgttatagaaggattaaat 

30 actaaaaaggcaatgactattatatctagtcgccctaatgaggtagcaaaagctattgat 
cagcaagttggaagaggattaacaatattaaatggacacggttattacactagagaagaa 
aaagatgtactttacgtagtcatctctaaaacacaagtatctcgtgctaaacgaatcatc 
aaaaatattgacgaaaatgcctttttagttattcatgacgttcgtgatgtatatggtaat 
ggttttttgttagatgagtaa 

35 

Sequence 1974 

MIGSFI FSAGINAFVISGNLGEGGVTGI AIVLYYAFHISPGITNFVLNAILI I VGYKYLS 
KRSTYLTIFATVLISIFLGLTETWHVETGNVVINAVFGGTCVGLGIGI IVLAGGTTAGTV 
ILARIVNKYLDISTPYALLFFDLIVVLISLTEIPLVKCLVTVMSLYIGTKVMEFVIEGLN 
40 TKKAMTIISSRPNEVAKAIDQQVGRGLTILNGHGYYTREEKDVLYVVISKTQVSRAKRII 
KNIDENAFLVIHDVRDVYGNGFLLDE* 

Sequence 1975 

Cont i g_0 6 8 8_pos_2 251_1454, 

45 is similar to (with p-value 2.0e-89) 

>sp:sp|P4 9938|FHUC_BACSU FERRICHROME TRANSPORT ATP-BINDING P 
ROTEIN FHUC. >gp : gp | X93092 I BSFHUDBGC_4 B. subtilis fhuDBGC ge 
nes. NID: gl070011. >gp : gp | Z99121 | BSUB0018_16 Bacillus subti 
lis complete genome (section 18 of 21): from 3399551 to 3609 

50 060. NID: g2635827. >gp: gp | AJ223978 I BS4 3KBDNA_12 Bacillus su 
btilis 42.7kB DNA fragment from yvsA to yvqA. NID: g2832786. 
atgagtcgtttaagtggtgaacaagtgaaaattggctacggtgattctacgattattaat 
aatttggatgtcgcaattcctgatggaaaggttacttctattattggacctaacgggtgt 
gggaaatcaactttattgaaagcgttatctagactattgtcaattaaagaaggtaaaatt 

55 aatttggatggtaagagtattcatgccacatccacgaaagaaatagctaaaaaaatagca 
attttaccacaatcaccagaggtcccagatggacttactgtaggagaacttgt ttcttat 
gggcgttt tccacatcaaaaaggatt tggtcgtttaactgcagaagataaaaaagaaatt 
gattgggcattgtcagttacaggtacaagtgaatttcgtcatcgtactataaatgattta 
agtggtggacaaagacaacgcgtgtggattgcaatggcactagcccaacgtactgatatt 
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attttcttagatgaacctacaacttatttagatatttgtcatcaattagaaatattaaat 
ttagtcaaaaagctcaacgaagaagaaggttgcactattgtgatggttttacatgacatt 
aatcaagcaattcgcttctcagatcatctcattacgatgaaagctggagatattgttgct 
actggtcaaactgatgaagtgttaactaaggacattttagaaaaggtatttaatattgat 
5 ggtgttttagatatagatccgagaacagggaaaccaattttagttacttacgatttattc 
tgtcagacgtattcgtga 

Sequence 1976 

MSRLSGEQVKIGYGDSTIINNLDVAIPDGKVTSIIGPNGCGKSTLLKALSRLLSIKEGKI 
10 NLDGKSI HATSTKEIAKKIAILPQSPEVPDGLTVGELVSYGRFPHQKGFGRLTAEDKKEI 
DWALSVTGTSEFRHRTINDLSGGQRQRVWIAMALAQRTDI IFLDEPTTYLDICHQLEILN 
LVKKLNEEEGCTIVMVLHDINQAIRFSDHLITMKAGDI VATGQTDEVLTKDILEKVFNI D 
GVLDIDPRTGKPILVTYDLFCQTYS* 

15 Sequence 1977 

Contig_0688_pos_1134_634 , 

putative peptide of unknown function 

atgcaacatcttataaaaaaacatgtattgaatggcgagtttgaactagttagacagtta 
atgtccgaaacagattttatggaatttgaagaagcatacatctctagtgctcatgaagta 

20 gagagtatgatgttttatacatgtattctagatatgattaaggtagaagaatcat cagaa 
ttacatgatttagcattccttttacttgtttatcctttaagtgaatatgagggcgcacta 
gattcagcttattatcatgcagattcttccataaaacttactgacggaaatgaagtgaag 
agtttattacaaatgttattgcttcatgctattcctgagccagttatttcggataaaaaa 
gcgtttgatgtcgctaaacgaattctaaaactcgatccaagtaataatgttgcacgaaat 

25 gtacttaaagatacagcaaaacgtatggataatgtagttgtagacattaatgaattgaac 
aatcaaagagatgcacgctaa 

Sequence 1978 

MQHLIKKHVLNGEFELVRQLMSETDFMEFEEAYISSAHEVESMMFYTCILDMIKVEESSE 
30 LHDLAFLLLVYPLSEYEGALDSAYYHADSSIKLT DGNEVKSLLQMLLLHAI PEPVISDKK 
AFDVAKRILKLDPSNNVARNVLKDTAKRMDNVVVDINELNNQRDAR* 

Sequence 1979 
Contig_0688_pos_0_4 09, 

35 putative peptide of unknown function 

atggggctgaataaagaagctataaaaattggttttgcctatgtcggcattgttgtcggc 
gcaggattttcaacaggacaagaagtgatgcaatttttcacaccatttggtttatggtca 
tatattggagtgattatctcaggatttatacttggattcataggaagacaagtagctaag 
ataggtactgcatttgaagcgaaaaatcacgagtccacattgcaatatgtgtttggaaaa 

40 aaatttagtaaagtttttgattatattcttgtctttttcttatttggtatagctgtcact 
atgatagccggatcaggttctacttttgagcaaagttttggaattcctacttggttaggc 
gcattaatcatgacagttttgatttacttaacattattattagCGTCAA 

Sequence 1980 

45 MGLNKEAIKIGFAYVGI VVGAGFSTGQEVMQFFTPFGLWSYIGVIISGFILGFIGRQVAK 
IGTAFEAKNHESTLQYVFGKKFSKVFDYILVFFLFGIAVTMIAGSGSTFEQSFGIPTWLG 
ALIMTVLIYLTLLLASX 

Sequence 1981 
50 Contig_0690_pos_3925_4 4 64 , 

is similar to (with p-value 7.0e-61) 

>gp:gp|Z99107|BSUB0004_107 Bacillus subtilis complete genome 
(section 4 of 21): from 600701 to 813890. NID: g2632866. >g 
p:gp| Y15254 | BSYERABCDji Bacillus subtilis 13kB DNA fragment, 
55 from yerA to sapB gene. NID: g2577959. 

atggtagagatagacacctatccaagctttattactgttgatggtggtgaaggtggtaca 
ggcgctaccttccaagagcttgaagatggtgttggtttaccgttatttacagcacttcct 
atcgtttcaagtatgttagaaaagtatggcataagaaacaaggttaaaatttttgcgtcc 
ggtaaattagtgactccagataaaatcgcaattgcattaggattaggtgcggatctcgtc 
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aatattgctagaggtatgatgataagtgtaggatgcatcatgagtcaacaatgtcattta 
aatacatgtccagttggagtagcaacaaccgatcctaaaaaagaaaagggacttattgtt 
gatgaaaaacaataccgtgttacaaattatgttacaagtttgcatgaaggtttatttaac 
atcgctgcagctgtaggtgttcatagtccaacggagattacttccgaccatattatctat 
5 agacaattagatggcactacaacgtccattcaggattataaacttaaattaatttcttaa 



Sequence 1982 

MVEIDTYPSFITVDGGEGGTGAT FQELEDGVGLPLFTALPI VSSMLEKYGIRNKVKIFAS 
10 GKLVTPDKIAIALGLGADLVNIARGMMISVGCIMSQQCHLNTCPVGVATTDPKKEKGLIV 
DEKQYRVTNYVTSLHEGLFNIAAAVGVHSPTEITSDHIIYRQLDGTTTSIQDYKLKLIS* 



Sequence 1983 
15 Contig_0690_pos_3157_2813, 

putative peptide of unknown function 

gtgacaaaccggaggaaggtggggatgacgtcaaatcatcatgccccttatga tttgggc 
tacacacgtgctacaatggacaatacaaagggcagcgaaaccgcgaggtcaagcaaat cc 
cataaagttgttctcagttcggattgtagtctgcaactcgactatatgaagctggaatcg 
20 ctagtaatcgtagatcagcatgctacggtgaatacgttcccgggtcttgtacacaccgcc 
cgtcacaccacgagagtttgtaacacccgaagccggtggagtaaccatttggagctaqcc 
gtcgaaggtgggacaaatgattggggtgaagtcgtaacaaggtag 

Sequence 1984 

25 VTNRRKVGMTSNHHAPYDLGYTRATMDNTKGSETARSSKSHKVVLSSDCSLQLDYMKLES 
LVIVDQHATVNTFPGLVHTARHTTRVCNTRSRWSNHLELAVEGGTNDWGEVVTR* 

Sequence 1985 

Cont i g_0 6 90_pos_0_68 5 , 

30 is similar to (with p-value 5.0e-96) 

>gp:gp| AF029225I AF029225_1 Staphylococcus carnosus NarG, Nar 
H, NarJ, and Narl genes, complete cds . NID: g3929521. 
atgaagcacttgttaggtgcgcgctctggtttaatggcagagccaaatgaagatgataaa 
ccagaggaaat taaatggcgcgaggatacagaagggaaacttgatt tattagtatcact t 

35 gatttcagaatgactgcgacgccattatattcagatatcgttttacctgctgcaacttgg 
tatgaaaaacatgatttatcttctacagacatgcatccatttattcatccatttaaccca 
gcgattgacccattatgggaatcgcgttcggactgggatatttataaaactctaagtaaa 
gctgtttcagaaatggcgaaagattatcttccaggtaaatttaaagatgtcgtaactaca 
ccattaggacatgattcaaaacaagaaatttcaactgaatacggtattgtaaaagattgg 

40 tctaaaggagaaattgaaggtgtgccaggtaaaacaatgcctaatt tttctatcgtagag 
cgagactatacacaaatttacgataaattcgttactgttggtccaaaactagaaaaaggg 
aaaataggtgctcatggtgtgagttatagcgttagtgaagagtacgaagaacttaaaagt 
atagttggaacttggaatgatgataatactatttcagttaaaaatgatagaccgagaata 
gatacagcgagaaaagtagcagaGG 

45 

Sequence 1986 

MKHLLGARSGLMAEPNEDDKPEEIKWREDTEGKLDLLVSLDFRMTATPLYSDIVLPAATW 
YEKHDLSSTDMHPFIHPFN PAI DPLWESRSDWDIYKTLSKAVSEMAKDYLPGKFKDWTT 
PLGHDSKQEISTEYGIVKDWSKGEIEGVPGKTMPNFSIVERDYTQI YDKFVTVGPKLEKG 
50 KIGAHGVSYSVSEEYEELKSIVGTWNDDNTISVKNDRPRIDTARKVAEX 

Sequence 1987 

Contig_0691_pos_3686_4 519, 
is similar to (with p-value 1.0e-59) 
55 >gp:gp| U4O604 |LMU40604_6 Listeria monocytogenes ClpC ATPase 
(mec) gene, complete cds. NID: gl314293. 

atgagacgtagtgcggtagaaatattatttgctacaattggtttaattattggtttatt t 
atttcagtgatggtttcttttatcttagaaatgataggtaattccatattaaatcacttt 
gtacctatgataatcactattattttatgttatttagggtttcaatttggtctgaaaaaa 
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agagatgaaatgcttatgtttttaccagagaatatggcacgttccatgtctaataatata 
cgaagagcgacacctaagattgtagatacaagtgccattatcgatggaaggatattagat 
attatacgttgcggatttatcgatggtgatatattgataccacaaggcgttataaatgaa 
ttacaggttatagcggatgctaaagatagcgtgaaacgtgaaaaaggtcaaagaggatta 
5 gatattttgaatcaactttatgatttagattatcctacacgcgttatacatccaactcaa 
tcccatagtgatatagatacattattaattaaattagcacaacagtatcatgcacatgtg 
attacgactgattttaatttaaataaagtatgtcacgttcaaggaattacagcactcaac 
gttaatgatttatcggaagcaatcaaacctaatgtacatcaaggcgaccagttaagtatt 
ttattaacgaagataggtaaagagccaggacaaggcgtaggatatttagatgatggtaca 
10 atggtggttgttgataacgcgaagagttacattggtcaacaagttaatttagaggtcgta 
agtttgttacaaacatcatcaggaagaattgtttttgcgaaatttgttgactga 

Sequence 1988 

MRRSAVEI LFAT IGLI I GLFI S VMVS FI LEMIGNS I LNH FVPMI I T I ILC YLGFQFGLKK 
15 RDEMLMFLPENMARSMSNNIRRATPKIVDTSAI IDGRILDIIRCGFIDGDILIPQGVINE 
LQVIADAKDSVKREKGQRGLDILNQLYDLDYPTRVIHPTQSHSDIDTLLIKLAQQYHAHV 
ITTDFNLNKVCHVQGITALNVNDLSEAI KPNVHQGDQLS I LLTKI GKEPGQGVGYLDDGT 
MVVVDNAKSYIGQQVNLEWSLLQTSSGRIVFAKFVD* 

20 Sequence 1989 

ContigJD692_pos_2251_3219, 

putative peptide of unknown function 

gtgggtagggaaattccaccacatacttataaatttaaagatcaagaaacatatgaaagc 
ttgataaggaatttagctcttcatcaaggtaaaaaaatctactttcaatatattcatgat 

25 gaagatattttacctaaagaatattatgcacttgataaagatgtttttgttgctcttaat 
aataaagcacgaattccagaatggactaataacaaatatctaccacaaagagaaattgtc 
tcaattaaagattttgaatctcatattcaagcatggtcgtatccatttgtcataaeacca 
ggcgatgatttacctacagcaggaggatatggtgttatgatttgttataatgatacagat 
ttagctaaagctatcacacgcatcaacaatgcatcagcagagactgaaaatttaatcatt 

30 gaacaaaaaattaatgcagtgaataactattgtgtacaatttgcttattcagatgatatt 
ggtatcaaatacttaggaacagcgcaacagttaactaatgactatggattttacaacgga 
aatgaaaatgttaatgatgtgcctcagaatgtaatagacgctggtagagagattatggaa 
ataggcgtaagcaaaggtttttttggtgtagcaggttttgacttactagtagatgataat 
aatgatgtttatgcgat tgatttaaactttaggcaaaacggatcaacgagtatgctactt 

35 ttagcaaaagatttaactcatggatatcataaattttacagttacttttctaatggagat 
aatacaaaattctataatgctattttaaaatacgtagaattaggtgtactttatccactt 
tcctattacgatggagattggtatggaaagaatcaagttaattctagatttggctgcatt 
tggcatggggaaaataaagaattaattaatcgatatgaacaacaatttatattggaagct 
ggattataa 

40 

Sequence 1990 

VGREIPPHTYKFKDQETYESLIRNLALHQGKKIYFQYIHDEDILPKEYYALDKDVFVALN 
NKARI PEWTNNKYLPQREIVSIKDFESHIQAWSYPFVIKPGDDLPTAGGYGVMICYNDTD 
LAKAITRINNASAETENLIIEQKINAVNNYCVQFAYSDDIGIKYLGTAQQLTNDYGFYNG 
45 NENVNDVPQNVIDAGREIMEIGVSKGFFGVAGFDLLVDDNNDVYAIDLNFRQNGSTSMLL 
LAKDLTHGYHKFYSYFSNGDNTKFYNAILKYVELGVLYPLSYYDGDWYGKNQVNSRFGCI 
WHGENKELINRYEQQFILEAGL* 

Sequence 1991 
50 Contig_0692_pos_3392_4057, 

putative peptide of unknown function 

atgtcatacaaatatgaagcattttttaaagatattttgattaatgaatatatttatttt 
gcttcaaaaaataaaaaattaattagaatacaacatgagaatttgccatatattgctatg 
tggacagacgaaaatgttgctgagtct tatttgttacatcattcaattgattacgacaaa 
55 atcattagagcagatattgaccgttttgtaacatatgaaatggatgaaatctttgatcca 
ggtgacaaagttttagttaatgtgaataatggtgaagaaggaaacattgtagatatagtt 
aaaatgactgatgagtt'gatgtctgaattagatgatataagaatgagagagtttattaaa 
gatgtcgcaaaatatgacgaagtatacggattgacaaacaaaggtgaaaagaattttatt 
atgatttcagatgatgaccataacaaaccacacatcatgcctgtttggagtattaagagt 
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agagcgcgtaaagtacgtgatcaagattttgaagaatgtgatttaatcgaaattgaaggt 
gaagtctttagtgaatggttagacaagttacgcgatgataataaagcagtagcgattgat 
ttgaaatcaggtgttgttggtactgttgtatcagcgcaaaaactgtcaaatgaagcaaca 
ttttaa 

5 

Sequence 1992 

MSYKYEAFFKDILINEYIYFASKNKKLIRIQHENLPYIAMWTDENVAESYLLHHSIDYDK 
I IRADIDRFVTYEMDEIFDPGDKVLVNVNNGEEGNIVDI VKMTDELMSELDDIRMREFIK 
DVAKYDEVYGLTNKGEKNFIMISDDDHNKPHIMPVWSIKSRARKVRDQDFEECDLIEIEG 
10 EVFSEWLDKLRDDNKAVAIDLKSGWGTVVSAQKLSNEATF* 

Sequence 1993 

Contig_0692_pos_4 711_4 280, 

putative peptide of unknown function 

15 atgatacaaggtttaggctatttattgtccaatataacagattataaagaattaacgaat 
ttagctcaaaatggagatcgtgatgccattgatttaaaagtaaaacatatttataaagat 
actgaaccaccaattcctggagatttaacagcagcaaattttggaaatgtattacatcac 
ttagataatcagtttacatcagctaacaaacttgcctctgcaattggcgtcgttggtgaa 
gttataacaactatggctattacattagcacgtgaatataagactaagcacgttgtatat 

20 atcggttcatcatttaataacaatcaattactacgtgaagttgttgaaaattacactgtt 
ctaagaggatttaaaccgtactatattgagaatggtgctttttcaggcgctttaggagca 
ctttacctctaa 

Sequence 1994 

25 MIQGLGYLLSNITDYKELTNLAQNGDRDAIDLKVKHI YKDTEPPIPGDLTAANFGNVLHH 
LDNQFTSANKLASAIGVVGEVITTMAITLAREYKTKHVVYIGSSFNNNQLLREVVENYTV 
LRGFKPYYIENGAFSGALGALYL* 

Sequence 1995 
30 Contig_0693_pos_7050_7454 , 

putative peptide of unknown function 

atgataaaagatggtattggagcaatagctcctcttgggagtggggaaacttatggctac 
catactttagatcaacatatacaagattaccctcataatgtaactagatttttagtagtg 
aaaaatcatactcattttattgaacatccaaacacaactatcttccttatcacgcctaag 
35 tatgataagccaggacttttagctagtgttttaaatacttttactttattcaacataaat 
ttatcgtggattgaatctagaccacttaaaactcaattaggtatgtatcatttttatgtt 
caagccgatactgctataaataatgatgtgaataaaattatttcaattttagagactttg 
gattttcaagttaaaattatcggcgcttttaataagaaaaactaa 

40 Sequence 1996 

MIKDGIGAIAPLGSGETYGYHTLDQHIQDYPHNVTRFLVVKNHTHFIEHPNTTIFLITPK 
YDKPGLLASVLNTFTLFNINLSWIESRPLKTQLGMYHFYVQADTAINNDVNKIISILETL 
DFQVKI IGAFNKKN* 

45 Sequence 1997 

Contig_0693_pos_7 551_8120, 

putative peptide of unknown function 

atgagtttttgtaaattcatttactggcttctttatttacaatactttgtattgttggct 
tgctttcttatgagacagcttcagcctataaaagggaatgacaaaaatttaaacgtaact 

50 agcaaggtgaacaatcaacqttgggtaattacagagcgtcaaacaagtccacatattaat 
tttagaatacaaggtaaagttagtaatcacattacgattacagtacctaaatatattaaa 
aacatagatattaaaactaatgccggggatttaaatattgttggagtaaatagtggcaca 
ggaagatttgatgctgaatctggagacattaaagttcaaaaaggacgatataaaaaggtg 
acacttcataatgaggatggggatattcaaatgaaacaattagaccctgatattccttta 

55 cgtattaaaaatgaagaaggggatataaacttgaattataaaaaagaacttcatcacacc 
caaatcatcactcgtaatgaagaaggggaaacagacatcgatcatcgtgtgttatataat 
agtaaagtactatttagtgacgtttcataa 

Sequence 1998 
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MSFCKFI YWLLYLQYFVLLACFLMRQLQPIKGNDKNLNVTSKVNNQRWVITERQTSPHIN 
FRIQGKVSNHITITVPKYI KNIDIKTNAGDLNIVGVNSGTGRFDAESGDIKVQKGRYKKV 
TLHNEDGDIQMKQLDPDIPLRIKNEEGDINLNYKKELHHTQIITRNEEGETDIDHRVLYN 
SKVLFSDVS* 

5 

Sequence 1999 

Contig__0693_pos_9026_9604, 

is similar to (with p-value 4.0e-47) 

>sp:sp|P42085|XPT_BACSU XANTHINE PHOSPHOR! BOS Y LT RAN SFE RASE ( 
10 EC 2.4.2.-). >pir : pir | S51309 | S51309 xanthine phosphoribosylt 

ransferase - Bacillus subtilis >gp:gp | L7724 6 I BACYACA_2 Bacil 

lus subtilis (YAC10-9 clone) DNA region between the serA and 
kdg loci. NID: gl256615. >gp: gp I 299115 I BSUBO01214 8 Bacillu 

s subtilis complete genome (section 12 of 21) : from 2195541 
15 to 2409220. NID: g2634478. >gp : gp I X83878 I BSXPTPBUX_1 B.subti 

lis xpt and pbuX genes. NID: g633168. 

gtggagtcgttaggacgaaaagtcaaagaagatggcgttgtcatcgatgagaaaattttg 
aaggtagatggatttttaaatcatcaaattgatgcaaagttgatgaatgatgtaggtaaa 
acattttatgagtctttcaaagacgctggtattactaaaattttaactattgaagcttct 

20 ggtattgcgcctgctattatggcttcttttcattttgatgttccttgtctatttgctaaa 
aaagctaaacctagtactttgaaagatggcttt tat agcacggatat teat tcatttaca 
aaaaataaaacgagtacagtcattgtatctgaagaatttttaggtgcagacgataaagta 
cttatcattgatgactttttagctaatggtgatgcttcgctaggtcttaatgacattgta 
aaacaagcaaatgcgacgacagttggcgtgggtattgtggttgaaaaaagtttccaaaat 

25 ggtcgccaacgtttagaagatgcaggcttatatgtatcttcactttgtaaggtagcttca 
ttaaaaggcaataaggtaactcttttaggtgaagcgtaa 

Sequence 2000 

VESLGRKVKEDGVVIDEKILKVDGFLNHQIDAKLMNDVGKTFYESFKDAGITKILTIEAS 
30 GIAPAIMASFHFDVPCLFAKKAKPSTLKDGFYSTDIHSFTKNKTSTVIVSEEFLGADDKV 
LIIDDFLANGDASLGLND1VKQANATTVGVGIVVEKSFQNGRQRLEDAGLYVSSLCKVAS 
LKGNKVTLLGEA* 

Sequence 2001 
35 Contig_0693__pos_964 3_10872, 

is similar to (with p-value 8.0e-95) 

>sp:sp|P42086|PBUX_BACSU XANTHINE PERMEASE. >pir : pir I S5 1310 I 
S51310 xanthine permease - Bacillus subtilis >gp: gp I L77246 I B 
ACYACA_3 Bacillus subtilis (YAC10-9 clone) DNA region betwee 

40 n the serA and kdg loci. NID: gl256615. >gp: gp I Z991 15 I BSUBO0 
12_147 Bacillus subtilis complete genome (section 12 of 21): 
from 2195541 to 2409220. NID: g2634478. >gp : gp | X83878 | BSXPT 
PBUX_2 B. subtilis xpt and pbuX genes. NID: g633168. 
atgtatgcaggggctat tcttgttcctattattgtggggacaagcttaaaattttcagct 

45 gaagaaattgcttatctagttactgttgatatatttatgtgcggggtagcgacat ttctt 
caagcaaataaagtcacagggactggattaccgattgtactaggatgtacgtttactgcc 
gttgcacctatgatactcatcggtcaaacgaaaggacttgatgttttatatggttcgctt 
ttaatatccggtatcttagttgttttaattgcaccttttttctcttatttagttaaattc 
tttccacctgttgtaacaggaagtgttgtgacaattattggaatcaatttaatgccagtt 

50 gcaatgaattacttggcaggtggtgaaggagcgaaaaactatggcgatactaagaattta 
atattaggtggtgttacactactcattattcttattttgcaaagatttacaaagggcttc 
ttgaaatcaattgcgatacttataggattagcaataggtactgctttagctggtatatt t 
ggaatggttgatatcaaacaagtgggtgatgcacattggtttggtttccctgtgccattc 
agattttctggcttcggatttgatgtcagctcaatacttgtatttttcattgttgcagtt 

55 gtaagtttaattgaatctactggtgtctatcatgcactgagtgaaattactggtagaaaa 
ctagaaagaaaagattttcgaaaagggtacactgcggaaggtctagcaatcattttaggt 
tcaatatttaatgcgttccctt acact gcatattcccaaaatgtaggtct tgtttcttta 
tcaggagctaaaaagaacaatgtgatatatggaatggttatt cttt tactaatttgeggt 
tgtatacctaaattaggtgctttagctaatattattccattgccggttttaggtggagca 
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atgatagcaatgtttggaatggttatggcatacggcgttagtattttgggtaacattaat 
ttccaaaatcaaaataatttattaattattgcaatttcagtagggttaggtgctggtatt 
agtgcagtacctcaagcatttaaaggattaggagaacaatttgcttggttaactcaaaat 
ggtatagtgcttggcgcaatttctgcaatcatcttaaatttcttttttaatggtataaag 
5 tataaacaaactgaagaaaatgtgaaataa 

Sequence 2002 

MYAGAILVPIIVGTSLKFSAEEIAYLVTVDIFMCGVATFLQANKVTGTGLPIVLGCTFTA 
VAPMILIGQTKGLDVLYGSLLISGILVVLIAPFFSYLVKFFPPWTGSVVTIIGINLMPV 
10 AMNYLAGGEGAKNYGDTKNLILGGVTLLIILILQRFTKGFLKSIAILIGLAIGTALAGIF 
GMVDIKQVGDAHWFGFPVPFRFSGFGFDVSSILVFFIVAVVSLIESTGVYHALSEITGRK 
LERKDFRKGYTAEGLAI I LGS I FNAFP YTAYSQNVGLVSLSGAKKNN VI YGMVI LLLICG 
CIPKLGALANIIPLPVLGGAMIAMFGMVMAYGVSILGNINFQNQNNLLIIAISVGLGAGI 
SAVPQAFKGLGEQFAWLTQNGIVLGAISAIILNFFFNGIKYKQTEENVK* 

15 

Sequence 2003 

Contig_0693_pos_10910_11434, 

is similar to (with p-value 2.0e-63) 

>sp:sp|P21879|IMDH_BACSU INOSINE-5 1 -MONOPHOSPHATE DEHYDROGEN 
20 ASE (EC 1.1.1.205) (IMP DEHYDROGENASE) (IMPDH) (IMPD). >pir: 
pir|S12623|DEBSMP IMP dehydrogenase (EC 1.1.1.2.05) - Bacillu 
s subtilis >gp:gp 1X55669 | BSIMPDE_1 Bacillus subtilis guaB ge 
ne for IMP dehydrogenase. NID: g39958. 

atgtgggaaaataaatttgctaaagaatctttaacattcgacgacgtgttactcattcca 
25 gctgcatcagatgttttaccaagcgatgttgacttaagtgtcaaattatcagataagatc 
aagttaaacattcctgttatctcagcaggtatggatacagtaactgaatcaaaaatggca 
attgctatggctcgacaaggcggtttaggtgttattcataagaatatgggcgtcgaagag 
caagctgatgaggtacaaaaggttaaacgttcagaaaatggtgttatttctaacccgttc 
ttcttaacaccggaagaaagtgtgtatgaggctgaagcattaatgggtaaataccgtatc 
30 tctggtgtacccattgtcgataatcaagaggatcgcaagttgattgggattttaacaaat 
cgtgatttacgttttattgaagatttttcaattatcatagtcaatatttctattaattcc 
tctaaaggcattccaattgttttggcttcttttgacatagagtga 

Sequence 2004 

35 MWENKFAKESLTFDDVLLI PAASDVLPSDVDLSVKLSDKIKLNIPVISAGMDTVTESKMA 
IAMARQGGLGVIHKNMGVEEQADEVQKVKRSENGVISNPFFLTPEESVYEAEALMGKYRI 
SGVPIVDNQEDRKLIGILTNRDLRFIEDFSIIIVNISINSSKGIPIVLASFDIE* 

Sequence 2005 
40 Contig_0693_pos_17 655_16912, 

is similar to (with p-value 3.0e-97) 

>sp:sp| P5084 9I PNPA_BACSU POLYRIBONUCLEOTIDE NUCLEOTIDYLTRANS 
FERASE (EC 2.7.7.8) (POLYNUCLEOTIDE PHOSPHORYLASE) (PNPASE). 
>gp:gp|Z99112|BSUB0009_139 Bacillus subtilis complete genom 

45 e (section 9 of 21): from 1598421 to 1807200. NID: g2633902. 
>gp:gp!U29668 |BSU29668_2 Bacillus subtilis ribosomal protei 
n RpsO (rpsO) gene, partial cds, and polynucleotide phosphor 
ylase (pnpA) gene, complete cds. NID: gll84678. 
atggatgccggtgtaccaattaaagcgccagtcgcagggattgcaatgggactagtaacg 

50 cgtgacgatagctatacaattttaactgatattcaaggaatggaagatgcattaggtgat 
atggacttcaaagtagcaggtactaaagacggtattactgcgattcaaatggatattaaa 
attgatggtttaactcgagaagttattgaagaagcactagaacaagcgcgtcaaggacga 
ttagctattatggatcatatgcttcacacgattgaacaaccacgcgaagaattaagtgct 
tacgcaccaaaagtggtaactatgagtattaatccagataaaattcgagacgtgattgga 

55 ccaggtggtaagaaaatcaatgaaattatcgacgaaactggagttaaattagatattgaa 
caagatggtacaatctttataggtgctgtagatcaagcgatgattaaccgtgcaaaagaa 
attatcgaagatattacacgcgaagcggaagttggacaagtatatcatgctaaagtaaaa 
cgtattgaaaagtatggtgctttcgttgaattgttccctggtaaagacgcgttattacac 
atttctcaaatttcacaagaaagaattaataaagtagaagatgttcttaaaattggagat 
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acaattgaagtgaaaattactgaaatcgataaacaaggtcgcgttaatgcgtcacataaa 
gtattagagcaatctaaaaattaa 

Seguence 2006 

5 MDAGVPIKAPVAGIAMGLVTRDDSYTILTDIQGMEDALGDMDFKVAGTKDGITAIQMDIK 
IDGLTREVIEEALEQARQGRLAIMDHMLHTIEQPREELSAYAPKVVTMSINPDKIRDVIG 
PGGKKINEI I DETGVKLDI EQDGT I FIGAVDQAMINRAKEI I EDI TREAEVGQVYHAKVK 
RIEKYGAFVELFPGKDALLHISQISQERINKVEDVLKIGDTIEVKITEIDKQGRVNASHK 
VLEQSKN* 

10 

Seguence 2007 

Con t i g_0 6 93_pos_l 536 6_1 4 4 07, 
putative peptide of unknown function 

atgaaagacaacaaacctaataattcgaaattaattcaaacatatttaagtaagaaaact 
15 ttaagatatggtacagcaagtgcattaacattggcactctatttatttaacagtaacgta 
actgtgtatgcggatgaaaatactgcaaaccaaaatcaaggaacatcaccaaaaacttca 
cagacagcacctacaaataatactgaaaatacagatgccacagccataacaacagatcaa 
aataataatgatgaagaagaatacgatgcgtcatatgaacttccaattctttatgtaact 
gtctggctagatgatcaaggaaatat tattaaagatgctgtggaagatgctaaaacccct 
20 gcttcagaaaggcaaccggtgaaaattcctgggtaccaacattatagaacttctgtgagt 
gacggaat tact aagtt tat ttatcgtaaaattagcactgcacaatcacctatagttgaa 
aataatcaacaagataataatacaaataaagttgttgaaacaaccaatcaaaataaagat 
gaagtgaatggaaaagaacaaaatcaagcaaatacttcagtaacaaatacacaaattacc 
aaaaacgagaaagacgaagacacaaaaacactaaagaaagataaagacgagaaagaatct 
25 aaagacacaaaaacaccaaagaaagacaaagaaaagaaagacataaaaactccgaagaaa 
gatagagaagagaaaaaaccagtaataccaaaaagcggcaaagacgagaaagacacaaaa 
ataactaagaaagacaaagaagacgaaattacaacaacttccaagaaagataa taacaat 
gatgtacaagataaattaccggaaacaggtaaaacaaacgatattcaaaatcctgcttta 
ataatgttacttgctggtttaggtttattaggattatttagaaataaaataagagaatag 

30 

Sequence 2008 

MKDNKPNNSKLIQTYLSKKTLRYGTASALTLALYLFNSNVTVYADENTANQNQGTSPKTS 
QTAPTNNTENTDATAITTDQNNNDEEEYDASYELPILYVTVWLDDQGNI IKDAVEDAKTP 
35 ASERQPVKIPGYQHYRTSVSDGITKFI YRKISTAQSPIVENNQQDNNTNKVVETTNQNKD 
EVNGKEQNQANTSVTNTQITKNEKDEDTKTLKKDKDEKESKDTKTPKKDKEKKDIKTPKK 
DREEKKPVIPKSGKDEKDTKITKKDKEDEITTTSKKDNNNDVQDKLPETGKTNDIQNPAL 
IMLLAGLGLLGLFRNKIRE* 

40 Sequence 2009 

Contig_0693_pos_367 5_1084, 

putative peptide of unknown function 

gtgaaattaccttatggtgtgcaacaagacgctcatgaagtagaagatgcacttgagttt 
attaatcgtgtaattacacctttatcaccgatttcaacatttgctgcccgtaatccgtgg 

45 gaggggctagaagatgcttcgtttgatcaagtggcacgttggttaaaaagtgtgagggat 
gttgacatttatcctaatgcgtctactattcacagagcgattagtaataaagaaatagat 
ttaaaagtatttgaagaacggttggatgaaaatcgtgcgcattataataataggtcacta 
tctgacagtgatatcaacacatatattcaaagagcgaaaaatttaaaaacgattgaagaa 
ggttactttaatacaaaagataacgagaaactggaaaaatgggtacaaactaattttaag 

50 gattataagaaaaaagaagatgtgatagcgcaaagtgctagtgttttcacaaaggaaggt 
acacgacttattgatattttaaatgctcatatgattaagtggtctaaattatatgttgat 
gactttcaatcaagttggactatgccaaaaagagaaaaaggattctatcatgcctggcaa 
cgtttagttaaacatgatccattattcacaaaaaaacaacgacttactttagcacatttg 
ccaaatcaagcaaccgaagcaatagagtacgcctttcaagaattaggagtaaaagaagaa 

55 catcgacaatcatatattgagagtcatttattatctttaccaggttgggcaggaatcatg 
tatcatcggtcacagacacaaagtaatgatgcgtacttattaacagactatgttgcgatt 
cgtctatcaattgagatggtacttttaaatgaccaccatacaacattattaaaaaaatct 
atatatcttcaaaaaaagttagagcaaatacgttatttgctatttaacatacaaatgaat 
gttgagcagtggttaaatctatcatctaaaaagcaacaagcatacattgaattggggaca 
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